Driver Python

Location:

Wilmette, IL, 60091

Posted:

October 06, 2020

Contact this candidate

Resume:

Zhengqing Gao

Evanston, Chicago +1-312-***-**** **************@*****.***

LinkedIn: www.linkedin.com/in/zhengqing-gao-834b50175 EDUCATION

Northwestern University Evanston, IL

Master of Science in Transportation Engineering GPA:3.7 Expected Mar 2021 Technion-Israel Institute of Technology Israel

Bachelor of Science in Civil & Environmental Engineering GPA:91.2/100 Aug 2015 – Aug 2019 SKILLS

Programming languages: Python, SQL, Spark, Pytorch, C, R, MATLAB Tools: Tableau Machine Learning: Regression, Classification, Clustering, Decision Tree, Random Forest, GDBT, PCA, LDA, CNN, RNN Statistical Analysis: Exploratory Data Analysis, A/B Test, Experimental Design, Hypothesis testing, Text Mining WORK EXPERIENCE

Mount Morning Capital Beijing, China

Investment Assistant Dec 2019 – July 2020

● Conducted due diligence investigation and built financial forecast model, successfully facilitate investments to the 3 startups at series A/B funding for total 6 million financing.

● Researched markets, business models and innovative technologies in IoT, Connected vehicle field. Monthly report to investment committee

PROJECTS

Chicago Crime Analysis in Apache Spark July 2020 – Aug. 2020

● Performed spatial and time series analysis for a 10 years dataset of reported incidents from Chicago Police Department.

● Build the pipeline based on Spark RDD, Dataframe and Spark SQL for big data OLAP

● Explored and visualized the variation of the spatial distribution of incidents over time. Taxi Driver Safety Scoring and Risk Analysis June 2020 – July 2020

● Extracted driving behavior from trajectory data, assessed the driver risk and build model to predict driver’s risk.

● Data preprocessing and feature extraction in Python. Applied SMOTE technic to augment minority data class.

● Trained XGBoost and Multivariate normal model GMM model for building the driver score prediction model and dangerous driver detection.

● A following system is built to restrain the dangerous divers to follow the majority drivers/safest divers and simulated with the result showed the reduction of 80% original potential collision. Customer Churn Prediction and Analysis in Bank Industry June 2020 – July 2020

● Developed algorithms to predict customer churn based on labeled data via Python

● Conducted data cleaning, categorical feature encoding, features standardization and data splitting, etc.

● Trained supervised learning models including Logistic Regression, Rando Forest and KNN, and applied regularization with Grid Search to find optimal hyperparameters to overcome overfitting

● Select model by comparing precision, recall, ROC and AUC for all models via k-fold cross-validation. Random forest model is chosen with final AUC score equal to 0.83. Nature Language Processing and Topic Modeling on User Review Dataset May 2020 – June 2020

● Clustered customer reviews into groups and analysis the latent semantic structures via Python

● Preprocessed documents by tokenization, stemming, and build feature space by Term Frequency – Inverse Document Frequency (TFIDF)

● Trained Two unsupervised learning model including K-means and Latent Dirichlet Analysis, further identified most representative key words and topic for each cluster. Movie Recommendation Engine Development in Apache Spark April 2020 – May 2020

● Built ETL pipeline to analyze movie rating dataset. Conducted OLAP and built a recommendation system with Spark.

● Performed data prepossessing, and statistical exploration. Implemented the Matrix-based approaches and Alternative Least Square model to provide personalized movie recommendations.

● Conducted model hyper-parameters tuning with Spark ML cross-evaluation toolbox.

Contact this candidate