Data Python

Location:

Champaign, IL

Posted:

January 22, 2021

Contact this candidate

Resume:

Simona(Jingyu) Zhang

Chicago, IL 217-***-**** *********@*****.***

LinkedIn: h"p://www.linkedin.com/in/jingyu-zhang-ba6648161 EDUCATION

University of Chicago, Chicago, IL December 2021

Master of Science in Analytics

GPA:3.77/4.0

University of Illinois at Urbana-Champaign, Champaign, IL May 2020 Bachelor of Science in Statistics Bachelor of Science in Agricultural and Consumer Economics GPA:3.82/4.0

SKILLS

• Programming Skills: Python (numpy, pandas, sklearn, matplotlib), R, SQL, SAS

• Machine Learning: Classical & Penalized Regression Methods (Lasso, Ridge), Decision Tree, Random Forest, K Nearest Neighbors, Clustering, K- Means, Principal Component Analysis(PCA)

• Office: Advanced Excel, Tableau, PowerPoint

• Statistics Analysis: Hypothesis Testing, A/B testing WORK EXPERIENCE

MTY Group, Inc. New York, NY October 2019 – December 2019 Marketing intern

• Applied machine learning classification models to identify high-value customers, conducted cross-sectional analysis utilizing random forest regression, and detected influential factors

• Conducted data visualization for impact of factors using R library dplyr & ggplot2 Department of Agricultural and Consumer Economics, Champaign, IL February 2018 – December 2019 Research Assistant for Professor Gary Donald Schnitkey and Professor Mindy L. Mallory

• Participated various projects for crop price predicting and insurance rate premium forecasting. Read research paper on statistical learning and simulated algorithms with R

• Utilized various regression machine learning models for prediction problems, selected and validated model through 5- fold cross validation, wrote report on model performance

• Conducted data visualization for impact of factors using R library dplyr & ggplot2 PROJECTS

Natural Language Processing and Topic From User Review Dataset

• Used Python to cluster customer reviews into groups and learnt the hidden semantic structures

• Used Term Frequency – Inverse Document Frequency (TFIDF) to preprocess review texts by removing stop words, tokenization, stemming and extracting features

• Trained unsupervised learning models of K-means clustering

• Identified latent topics and keywords of each review for clustering Customer Churn Prediction and Analysis

• Improved algorithms for bank to predict customer churn probability in the labeled data by Python

• Preprocessed data set by data cleaning, categorical feature transformation and standardization, etc

• Trained supervised machine learning models like Logistic Regression, Random Forest and K-Nearest Neighbors, and applied regularization with optimal parameters to reduce overfitting

• Evaluated model performance (F1 score 0.72) of classification by k-fold cross-validation and analyzed feature importance to identify top factors that influenced the results San Francisco Crime Analysis in Apache Spark

• Performed spatial and time series analysis for a 15 year dataset of reported incidents from SFPD

• Built data processing pipeline based on Dataframe and Spark SQL for big data OLAP

• Explored and visualized the variation of the spatial distribution of incidents overtime Financial Anomaly Detection and Risk Analysis

• Developed a machine learning model in Python to predict fraudulent transaction

• Performed exploratory data analysis on 138K+ transactions including and preprocessed data by removing duplicates, encoding categories features and handling imbalanced labeled data by SMOTE

• Built logistic regression and random forest models, evaluated the model via 10-fold cross validation

• Selected the best model based on AUC(best AUC 0.87)

Contact this candidate