Simona(Jingyu) Zhang
Chicago, IL 217-***-**** *********@*****.***
LinkedIn: h"p://www.linkedin.com/in/jingyu-zhang-ba6648161 EDUCATION
University of Chicago, Chicago, IL December 2021
Master of Science in Analytics
GPA:3.77/4.0
University of Illinois at Urbana-Champaign, Champaign, IL May 2020 Bachelor of Science in Statistics Bachelor of Science in Agricultural and Consumer Economics GPA:3.82/4.0
SKILLS
• Programming Skills: Python (numpy, pandas, sklearn, matplotlib), R, SQL, SAS
• Machine Learning: Classical & Penalized Regression Methods (Lasso, Ridge), Decision Tree, Random Forest, K Nearest Neighbors, Clustering, K- Means, Principal Component Analysis(PCA)
• Office: Advanced Excel, Tableau, PowerPoint
• Statistics Analysis: Hypothesis Testing, A/B testing WORK EXPERIENCE
MTY Group, Inc. New York, NY October 2019 – December 2019 Marketing intern
• Applied machine learning classification models to identify high-value customers, conducted cross-sectional analysis utilizing random forest regression, and detected influential factors
• Conducted data visualization for impact of factors using R library dplyr & ggplot2 Department of Agricultural and Consumer Economics, Champaign, IL February 2018 – December 2019 Research Assistant for Professor Gary Donald Schnitkey and Professor Mindy L. Mallory
• Participated various projects for crop price predicting and insurance rate premium forecasting. Read research paper on statistical learning and simulated algorithms with R
• Utilized various regression machine learning models for prediction problems, selected and validated model through 5- fold cross validation, wrote report on model performance
• Conducted data visualization for impact of factors using R library dplyr & ggplot2 PROJECTS
Natural Language Processing and Topic From User Review Dataset
• Used Python to cluster customer reviews into groups and learnt the hidden semantic structures
• Used Term Frequency – Inverse Document Frequency (TFIDF) to preprocess review texts by removing stop words, tokenization, stemming and extracting features
• Trained unsupervised learning models of K-means clustering
• Identified latent topics and keywords of each review for clustering Customer Churn Prediction and Analysis
• Improved algorithms for bank to predict customer churn probability in the labeled data by Python
• Preprocessed data set by data cleaning, categorical feature transformation and standardization, etc
• Trained supervised machine learning models like Logistic Regression, Random Forest and K-Nearest Neighbors, and applied regularization with optimal parameters to reduce overfitting
• Evaluated model performance (F1 score 0.72) of classification by k-fold cross-validation and analyzed feature importance to identify top factors that influenced the results San Francisco Crime Analysis in Apache Spark
• Performed spatial and time series analysis for a 15 year dataset of reported incidents from SFPD
• Built data processing pipeline based on Dataframe and Spark SQL for big data OLAP
• Explored and visualized the variation of the spatial distribution of incidents overtime Financial Anomaly Detection and Risk Analysis
• Developed a machine learning model in Python to predict fraudulent transaction
• Performed exploratory data analysis on 138K+ transactions including and preprocessed data by removing duplicates, encoding categories features and handling imbalanced labeled data by SMOTE
• Built logistic regression and random forest models, evaluated the model via 10-fold cross validation
• Selected the best model based on AUC(best AUC 0.87)