Cheng(Max) Peng
*** ***** *** *, *******, WA ****9 +1-339-***-**** adjrcs@r.postjobfree.com
Apply for Data Engineer / Software Engineer
Education
Northeastern University Sep 2018 – Dec 2020
Master of Science in Data Analytics and Engineering, GPA 3.7 Boston, MA Courses: Algorithms, Object-oriented design(Java), Data Mining, Data Management and Database Design, Data Visualization Huaqiao University Sep 2014 – Jun 2018
Bachelor of Engineering in Mechanical Engineering, GPA 3.6 Fujian, China Courses: Calculus, Linear Algebra, Probability and Statistics, Image Processing(C++), Robotics(MATLAB) Skills
Programming language: Python, Java, SQL, R, C++
Database & Service: MySQL, SQL Server, Vertica, Hive, RabbitMQ, Redis, GCP, AWS Framework & Tool: Apache Spark, Flask, Numpy, Pandas, matplotlib, ggplot2, stringr Development & ETL: Docker, Git, Kronos, Airflow
ML Framework: PyTorch, scikit-learn, CARET, nnet, dplyr BI Software: Tableau, Looker, Spotfire, AtScale
Work Experience
Wayfair.LLC Jan 2020 – Aug 2020
Data Engineer Intern Boston, MA
• Built dashboards in Looker & Tableau to monitor and analyze logistics and click stream data(TB), sent out monthly analysis report by Python procedure
• Developed and maintained ETL(Python & SQL) among Looker, GCP and Vertica, build pivot tables/dimensions/measures based on business metrics conceptions
• Participated in the cloud database migration project, built ETL procedures to transfer data(TB) in Vertica database to GCP
• Involved in schema design and ETL implementation of new released HomeServices project, defined core metrics and dashboards with analytics team
• Utilized PySpark to optimize large scale Table(TB) procedures and pipelines GE Healthcare Jul 2019 – Sep 2019
Data Analyst Intern Beijing, China
• Built dashboards in Spotfire based on X-ray exam log data(TB) to monitor more than 6500 machines’ condition and their users’ behavior in global area
• Analyzed time sequential text by Python and R procedures, providing real-time analysis on streaming machine log from cloud of 50,000 exams at same time
• Generated insight reports through Hive database to support reliability & service teams on workflow improvement
• Utilized Python and Hive SQL to troubleshoot abnormal data generated by machine OS, involved in enhance software robustness with software team
Projects
ETL and sentiment analysis on tweets relating to Covid-19 (Python) Nov 2020 – Dec 2020
• Set up twint scraper and store the dataset (TB+) in GCS using Airflow DAGs, built operator to load data in BigQuery
• Transferred data into Spark through Optimus and carried out a sentiment analysis based on NLP tool TextBlob
• Built Flask web application to connect with BigQuery and visualize analyzed data by Ploty in real-time Scalable Distributed Data System (Java) Sep 2020 – Nov 2020
• Designed a RESTful web application that allows ski resorts to collect and analyze lift usage data
• Implemented server using Tomcat and deployed on EC2 with Load Balancer, enabled cloud-based services
• Applied RabbitMQ as queueing solution for the servlet to reduce stress on the relational database on the RDS instance
• Tested with a self-designed multithreaded client which auto-generates three thousand requests