Post Job Free
Sign in

Engineer Data Analyst

Location:
Seattle, WA
Posted:
January 27, 2021

Contact this candidate

Resume:

Cheng(Max) Peng

*** ***** *** *, *******, WA ****9 +1-339-***-**** adjrcs@r.postjobfree.com

Apply for Data Engineer / Software Engineer

Education

Northeastern University Sep 2018 – Dec 2020

Master of Science in Data Analytics and Engineering, GPA 3.7 Boston, MA Courses: Algorithms, Object-oriented design(Java), Data Mining, Data Management and Database Design, Data Visualization Huaqiao University Sep 2014 – Jun 2018

Bachelor of Engineering in Mechanical Engineering, GPA 3.6 Fujian, China Courses: Calculus, Linear Algebra, Probability and Statistics, Image Processing(C++), Robotics(MATLAB) Skills

Programming language: Python, Java, SQL, R, C++

Database & Service: MySQL, SQL Server, Vertica, Hive, RabbitMQ, Redis, GCP, AWS Framework & Tool: Apache Spark, Flask, Numpy, Pandas, matplotlib, ggplot2, stringr Development & ETL: Docker, Git, Kronos, Airflow

ML Framework: PyTorch, scikit-learn, CARET, nnet, dplyr BI Software: Tableau, Looker, Spotfire, AtScale

Work Experience

Wayfair.LLC Jan 2020 – Aug 2020

Data Engineer Intern Boston, MA

• Built dashboards in Looker & Tableau to monitor and analyze logistics and click stream data(TB), sent out monthly analysis report by Python procedure

• Developed and maintained ETL(Python & SQL) among Looker, GCP and Vertica, build pivot tables/dimensions/measures based on business metrics conceptions

• Participated in the cloud database migration project, built ETL procedures to transfer data(TB) in Vertica database to GCP

• Involved in schema design and ETL implementation of new released HomeServices project, defined core metrics and dashboards with analytics team

• Utilized PySpark to optimize large scale Table(TB) procedures and pipelines GE Healthcare Jul 2019 – Sep 2019

Data Analyst Intern Beijing, China

• Built dashboards in Spotfire based on X-ray exam log data(TB) to monitor more than 6500 machines’ condition and their users’ behavior in global area

• Analyzed time sequential text by Python and R procedures, providing real-time analysis on streaming machine log from cloud of 50,000 exams at same time

• Generated insight reports through Hive database to support reliability & service teams on workflow improvement

• Utilized Python and Hive SQL to troubleshoot abnormal data generated by machine OS, involved in enhance software robustness with software team

Projects

ETL and sentiment analysis on tweets relating to Covid-19 (Python) Nov 2020 – Dec 2020

• Set up twint scraper and store the dataset (TB+) in GCS using Airflow DAGs, built operator to load data in BigQuery

• Transferred data into Spark through Optimus and carried out a sentiment analysis based on NLP tool TextBlob

• Built Flask web application to connect with BigQuery and visualize analyzed data by Ploty in real-time Scalable Distributed Data System (Java) Sep 2020 – Nov 2020

• Designed a RESTful web application that allows ski resorts to collect and analyze lift usage data

• Implemented server using Tomcat and deployed on EC2 with Load Balancer, enabled cloud-based services

• Applied RabbitMQ as queueing solution for the servlet to reduce stress on the relational database on the RDS instance

• Tested with a self-designed multithreaded client which auto-generates three thousand requests



Contact this candidate