Sign in

Data Engineer

Bronx, NY
January 31, 2019

Contact this candidate


YUCHEN PENG, C: 862-***-****


• Programming Languages: Java, C/C++, Python, R,

SQL, JavaScript

• IDE: Cloud9, Brackets, Rstudio, Spyder, Eclipse, Visual Studio, IntelliJ, CodeBlocks

• Database: PostgreSQL, MySQL, MongoDB, Hbase

• Frameworks/Tools: ReactJS, Hadoop, Express.js,

Node.js, HTML, CSS, AWS, Linux, Cloudera, Spark, Pig, Hive, Hadoop, Sqoop, Docker, Tableau

Work Experience

Intern - Software Engineer 09/2018 to 12/2018

New Jersey Institute of Technology

• Responsible for developing data analysis and visualization web platform for City of Newark using Node.js, Express.js, D3.js. Constructed Redis as cache layer for Back end / Front end separation.

• Implement distributed data processing and computing system with Spark-Python, Kafka-Python. Created data analysis models in Tableau-Python environment.

• Created a scalable cloud deployment environment using Docker and scheduling framework Mesos. Summer Intern - Software Data Engineer 05/2018 to 09/2018 City of Newark / NJIT Ying Wu College Of Computing Newark, NJ

• Optimized data storage schema for HBase to speed up data query performance.

• Transformed raw data into relational database with ETL application to prepare unruly data for machine learning.

• Built external data management application based on relational database logic, enable the users to join, search, import, update data, and translate address into latitute/longitute information for the usage of heat map visualization.

• Developed data processing pipeline with oozie. Ingest data with Sqoop and store into HDFS. Implement Interactive analysis using Impala and Hive in Cloudera environment.


Real-time Log Analysis System 03/2018 to 04/2018

• Implemented 186GB online store real-time access log data ingestion using Flink, Pig, storing the data on Hadoop. And interactively query the data via Presto, data visualization on Superset.

• Data processing using Pig/Spark/Hive, writing custom UDF, loader and storer to implement business logic.

• Implementing a data processing pipeline using Oozie. Big Data Analysis System of Cryptocurrency 12/2018 to 01/2019

• Implemented a high performance data processing platform using Apache Kafka, Apache HBase, and Apache Spark to analyze cryptocurrency data.

• Developed a dashboard to visualize real-time transactions data of cryptocurrency using NodeJS and Redis.

• Optimized payload size using Google Protocol Buffer to improve system throughput by 30%. Education and Training

Master of Science: Computer Science GPA 3.75/4.00, Jan.2018 - May.2019 New Jersey Institute of Technology

Bachelor of Science Sep.2007 - Jun.2011

Shenzhen University

Contact this candidate