Junchi (Jerry) Zhang
************@*****.*** 857-***-**** Arlington, VA, 22202
Education Background
George Washington University, Columbian College of Arts & Sciences Washington, D.C
- Master of Science in Statistics (Expect to graduate in May 2020) Boston University, College of Arts & Sciences Boston, MA
- Bachelor of Arts in Mathematics Sep 2014 – May 2018
- Bachelor of Arts in Economics
Data Analytics and Language Competency
Technical Skills: R, Python, SAS, SQL, Excel.
Related Courses: Probability Theories, Applied Linear Models, Data Analysis with SAS, Statistical Data Mining with R, Data Warehousing, Natural Language Processing, Applied Multivariate Analysis, Categorical Data Analysis. Language Skills: Chinese (native), English (fluent). Work Experience
Shanghai Jiaotong University
Summer Research Assistant Shanghai, China (Jun-Jul 2019)
- Researched on visual currency and evaluated reinforcement learning method for price analysis.
- Processed and transformed data using NumPy and Pandas.
- Visualized datasets using Matplotlib.
J&K Investment Holding Group
Summer Intern, Department of Quantitative Investment Shanghai, China (Jul-Sep 2018)
- Worked as an assistance for the full-time employees, extracting news information, data processing and investment strategy researching.
- Cleaned macroeconomic data and transformed data format using Excel and R.
- Applied linear regression data analysis methods to find useful indicators for portfolio building. Projects Experience
Toxic Comments Analysis Nov 2019
- Identified toxic comments and phrase structures using NLP methods.
- Use NLTK, NumPy, Pandas and Spacy for data processing including stop word elimination and sentence tokenization.
- Applied Naïve Bayes classifier to find a list of toxic words from all comments and in addition find frequently appeared phrases based on the word list.
Wrongful Conviction Analysis Apr 2019
- Built statistical models to find potential reasons of wrongful conviction.
- Applied SAS to process the data and built linear and non-linear multi-variate regression models.
- Analyzed the significance of each independent variable by using hypothesis testing. Customer Revenue Analysis Apr 2019
- Built generalized linear models, XGBoosting and gradient boosting machine models using R packages to predict natural log of the sum of all transactions per user. Models were assessed and compared by using root mean square error.
- Visualized data to do feature selection and discover potential relationships variables using R packages.
- Used correlation and PCA to select factors by feature importance and then reduce dimension. Duplicate advertisement March 2019
- Built classification models to identify duplicate ads and evaluate models by checking the AUC scores.
- Applied commonly used classification models using R packages: LDA, QDA, Logistic regression, SVM, Random Forests. Life Expectancy Analysis Nov 2018
- Studied the factors affecting life expectancy and try to predict the life expectancy based on these factors.
- Researched for related variables and selected variables using Ridge regression, Lasso regression, stepwise selections, correlation coefficient, hypothesis tests and etc. using R.
- Developed linear and nonlinear models to predict life expectancy and improved model performance using cross validation, R square, MSE, residual plots, and etc.