Data Engineer

Location:

Arlington, VA

Salary:

60,000

Posted:

September 08, 2020

Contact this candidate

Resume:

Ziqiu Zhu

Address: Washington, DC, ***** Cell: 202-***-**** *************@*****.***

EDUCATION George Washington University Washington, DC. Completed: May. 2020 Master of Data Analytics (Horned Student Scholarship) GPA: 3.8/4.0 Rutgers University New Brunswick, NJ Completed: Jan. 2018 Bachelor of Computer Science GPA: 3.5/4.0

Coursework: Machine Learning, NLP, Data Mining, Big Data Analytics, Deep Learning, Data-Driven. PUBLICATIONS Matching Algorithms for Taxi-Hailing Problem. Guantao Zhao, Yinan Sun, Ziqiu Zhu, and Amrinder Arora. Future Technologies Conference 2020 (FTC 2020).

WORK EXPERIENCE The GW University Research Assistant Sep. 2019-June. 2020

• Implemented Customer Oriented Taxi-Hailing Matching Algorithm to improve user experience.

• Decreased the waiting time of passengers to be less than 15 minutes in average by optimizing Path-Planning and Matching Algorithm, also set penalty term to penalize Long Waiting Time cases.

• Promoted Car Pool system to help customer save trip costs and adjusted threshold to control the total time passengers spend on the trip.

• Simulated real-world traffic system and results showed our algorithm saved vehicle companies $37,500 dollars/day, also increased the rate that a driver/a driverless vehicle match to a passenger (vehicle-to-passenger match ratio). Lenovo Data Engineer May. 2019-Oct. 2019

• Built ETL pipeline to Automate Extract buzzwords, buzzword-news and Automate to Create hot-news-predictor table.

• Worked in a Cross-Functional Team to design a Buzzword Web Crawler based on current market trens to scrape real-time buzzword dashboard of the most popular social platform in China, such as Baidu, Sougou, SinaMicroBlog, Wangyi.

• Completed Lenovo’s Buzzword Scraper&Hot News Predictor to detect 97% buzzwords and buzzword-news . Two thousand Machinery E-commerce Intern Jan. 2018- Oct. 2018

• Applied Search Engine Optimization rules to Improve Company’s Page Rank on the major global search engines.

• Analyzed daily clicks/traffic, weekly site/traffic on company homepage and Cooperated with Back-End developers to promote an Ad-hoc plan to increase the DAU clicks/impressions.

• Increased traffic from 1400 views to 3000 views in September.

• Achieved Top3 rank on Google homepage by Exact Keywords Search-By of “Chinese Commercial Kitchen Equipment”. SKILLS Programming Skills: Python, Java, Html, CSS, Js.

• Machine Learning: Logistic Regression, Decision Tree, Random Forest, Regularization, Boosting, Neural Network.

• Statistics: EDA, A/B test, Hypothesis Test, Model Evaluation & Measurement, Data Visualization.

• Big Data Technologies: SQL, Keras, AWS, Spark, MapReduce, PyTorch. PROJECTS

Fraud Detection and Risk Analysis July. 2020

• Build a machine learning pipeline in Python to detect first fraudulent transactions and deploy an alert system to prevent potential fraudulent activities.

• Performed EDA analysis and Pandas profiling on 138K+ transactions to detect NaN, deduplicates, and data distribution, encoding categorical features and handling imbalanced label data by SMOTE algorithm.

• Built a Statistical Analysis Model with Gaussian Distribution to detected fraudulent with 0.77 Precision score.

• Built Logistic Regression and Random Forest models, did model evaluation on 10-fold Cross Validation and tuned hyper- parameter via Grid Search.

Bank Customer Churn Prediction June.2020

• Completed a SML(supervised Machine Learning) algorithm to predict customer churn rate.

• Dive Deep on Column Features on Data Distribution/ Consistence and pre-processed dataset with Data Cleaning and Normalization.

• Trained Logistic Regression, K-Nearest Neighbors and Random Forest models to predict churn rate and did model competition and evaluation on Accuracy, Precision, Recall, F-Measure.

• Applied regularization to deal with Overfitting and in general, RF model performed the best, with best model performance of 0.85 accuracy on prediction customer churn action and 100% recall. Recommender System - Personalized Movie Recommendation May. 2020

• Designed a personalized Recommender System to recommend movies based on user Preferences.

• Developed Collaborative Filtering Algorithm (both Item-Item and User-User) by calculating Pearson (Centralized Cosines) Similarity with Weighted K-Nearest Users/Items to get Prediction Scores.

• Models results showed Item-Item Measure outperformed User-User in 100K user cases.

• Implemented KNN model, Matrix Factorization Algorithm, and CNN Deep Learning model to predict User ratings.

• Built data ETL Pipeline to analyze movie rating dataset and conducted OLAP with Spark SQL.

• Applied ALS model to provide personalized movie recommendations and developed user-based measures to handle cold-start problems.

Contact this candidate