Zhengqing Gao
Evanston, Chicago +1-312-***-**** **************@*****.***
LinkedIn: www.linkedin.com/in/zhengqing-gao-834b50175 EDUCATION
Northwestern University Evanston, IL
Master of Science in Transportation Engineering GPA:3.7 Expected Mar 2021 Technion-Israel Institute of Technology Israel
Bachelor of Science in Civil & Environmental Engineering GPA:91.2/100 Aug 2015 – Aug 2019 SKILLS
Programming languages: Python, SQL, Spark, Pytorch, C, R, MATLAB Tools: Tableau Machine Learning: Regression, Classification, Clustering, Decision Tree, Random Forest, GDBT, PCA, LDA, CNN, RNN Statistical Analysis: Exploratory Data Analysis, A/B Test, Experimental Design, Hypothesis testing, Text Mining WORK EXPERIENCE
Mount Morning Capital Beijing, China
Investment Assistant Dec 2019 – July 2020
● Conducted due diligence investigation and built financial forecast model, successfully facilitate investments to the 3 startups at series A/B funding for total 6 million financing.
● Researched markets, business models and innovative technologies in IoT, Connected vehicle field. Monthly report to investment committee
PROJECTS
Chicago Crime Analysis in Apache Spark July 2020 – Aug. 2020
● Performed spatial and time series analysis for a 10 years dataset of reported incidents from Chicago Police Department.
● Build the pipeline based on Spark RDD, Dataframe and Spark SQL for big data OLAP
● Explored and visualized the variation of the spatial distribution of incidents over time. Taxi Driver Safety Scoring and Risk Analysis June 2020 – July 2020
● Extracted driving behavior from trajectory data, assessed the driver risk and build model to predict driver’s risk.
● Data preprocessing and feature extraction in Python. Applied SMOTE technic to augment minority data class.
● Trained XGBoost and Multivariate normal model GMM model for building the driver score prediction model and dangerous driver detection.
● A following system is built to restrain the dangerous divers to follow the majority drivers/safest divers and simulated with the result showed the reduction of 80% original potential collision. Customer Churn Prediction and Analysis in Bank Industry June 2020 – July 2020
● Developed algorithms to predict customer churn based on labeled data via Python
● Conducted data cleaning, categorical feature encoding, features standardization and data splitting, etc.
● Trained supervised learning models including Logistic Regression, Rando Forest and KNN, and applied regularization with Grid Search to find optimal hyperparameters to overcome overfitting
● Select model by comparing precision, recall, ROC and AUC for all models via k-fold cross-validation. Random forest model is chosen with final AUC score equal to 0.83. Nature Language Processing and Topic Modeling on User Review Dataset May 2020 – June 2020
● Clustered customer reviews into groups and analysis the latent semantic structures via Python
● Preprocessed documents by tokenization, stemming, and build feature space by Term Frequency – Inverse Document Frequency (TFIDF)
● Trained Two unsupervised learning model including K-means and Latent Dirichlet Analysis, further identified most representative key words and topic for each cluster. Movie Recommendation Engine Development in Apache Spark April 2020 – May 2020
● Built ETL pipeline to analyze movie rating dataset. Conducted OLAP and built a recommendation system with Spark.
● Performed data prepossessing, and statistical exploration. Implemented the Matrix-based approaches and Alternative Least Square model to provide personalized movie recommendations.
● Conducted model hyper-parameters tuning with Spark ML cross-evaluation toolbox.