Data Analyst Software Engineer

Location:

Reston, VA

Posted:

December 09, 2020

Contact this candidate

Resume:

Yinghai Yu

LinkedIn: https://www.linkedin.com/in/yinghai-yu-522aaa150 ***********@*****.***

Portfolio: https://iyutpo.github.io/Yinghai-Yu/ +1-814-***-**** Washington, D.C. HIGHLIGHTS

● Languages: Python, Java, JavaScript, Node.js, R, MATLAB, C++

● Databases: PostgreSQL, MongoDB, MySQL, SQL Server

● Tools: Azure, AWS, S3, EMR, Apache Spark, Tableau, Power BI, ArcGIS, TensorFlow, H2O, Git, Databricks

● Certifications: AWS Associate Architecture Solutions EDUCATION

The Georgia Institute of Technology Atlanta, GA

M.S. of Computer Science Incoming Student Anticipated Enroll Date: 01/2021 The Pennsylvania State University University Park, PA M.S. of Civil (Transportation) Engineering, Computational Science GPA: 3.63/4.0 08/2017 – 08/2019 Shandong University of Technology Zibo, China

B.S. of Civil (Transportation) Engineering Ranking: 6/83 GPA: 3.5/4.0 09/2013 – 06/2017 PROFESSIONAL EXPERIENCE

Weris Inc Washington D.C., United States

Senior Data Analyst Assisted Virginia Department of Health to design Covid-19 dashboard 11/2019 – Present

● Deployed regression and time-series model to predict cumulative confirmed cases in VA, MD and D.C. using Scipy.

● Defined measurement Rt value to monitor the transmission rate of Covid-19 and how quickly it is spreading.

● Designed dashboard in Power BI, and deployed Python code, database, and dashboard on Azure.

● Assisted front-end software engineer team to insert dashboard into Weris Inc’s website. Software Engineer Assisted M&T Bank to handle transaction data, and provided software assistance 11/2019 – Present

● Cooperated closely with clients from M&T Bank, configured and employed EC2 instances on AWS.

● Extracted user’s transaction data from the database of M&T Bank, transformed and loaded data into data warehouse.

● Performed testing procedure to guarantee format consistency and data security after generating ACH file using Java. Data Analyst Built and testified dynamic traffic tolling system for I-66 Beltway for VDOT 11/2019 – Present

● Established ETL pipeline to collect million-row traffic data of I-66 from VDOT database using SQLAlchemy.

● Defined key KPIs (speed, density, volume) to evaluate real-time traffic condition after aggregating data from database.

● Designed dashboard in Power BI to summarize the overall traffic condition on I-66 and delivered reports to VDOT. Tencent PTA Internship Shenzhen, China

Data Science Intern Tencent Android App Store market share investigation 06/2019 – 08/2019

● Conducted a market survey on the market share and DAU of mobile APP stores including Tencent APP store.

● Wrote SQL scripts to query data from the database, and sorted app stores according to downloads and DAU, MAU.

● Visualized the queried data in dashboard using Tableau, generated survey reports and submitted them to product managers. Global AI Big Data Company Artificial Intelligence, Natural Language Processing in Finance New York, United States Quantitative Strategy Intern Provide R&D team reliable data and assistance on modeling 08/2018 – 12/2018

● Refactored Python Scrapy code to stablish a web crawler, and reduced memory cost by using DFS algorithm.

● Built up ETL pipeline to collect greenhouse gas emission and time series data (~6 GB) from websites.

● Initialized Markov Transition Matrix and performed simulation using Markov Regime Switching model for GDP nowcasting and reduced short-term GDP prediction error by ~3%. RESEARCH & PROJECTS

Stock Prediction based on LSTM using TensorFlow Course Project 08/2018 – 06/2019

● Performed EDA and analyzed the volatility pattern of stock price using 5-year (million-row) stock data in Python.

● Established deep learning time series model (LSTM) to predict the price of MMM stocks using TensorFlow.

● Tuned LSTM model via hyperparameter tuning, e.g. changing activation, regularization, etc. using TensorFlow on GPU.

● Reduced prediction error from 5.98 to 3.56 USD after deploying the fine-tuned LSTM model using 7-day window size. San Francisco Crime Analysis in Apache Spark Kaggle Project 12/2018 – 03/2019

● Gathered 15-year (million-row) San Francisco crime geospatial data and built a data processing pipeline based on Apache Spark RDD, Dataframe and Spark SQL for big data online analytical processing (OLAP).

● Clustered crime using K-Means algorithm and discovered northeastern region has the highest crime frequency.

● Trained and fine-tuned an ARIMA model to forecast the number of assault crimes per month in San Francisco.

● Created animation to visualize assault crime frequency and helped advise visitors of crime risk San Francisco.

Contact this candidate