Post Job Free
Sign in

Data Assistant

Location:
Arlington, VA
Posted:
March 01, 2020

Contact this candidate

Resume:

Junchi (Jerry) Zhang

************@*****.*** 857-***-**** Arlington, VA, 22202

Education Background

George Washington University, Columbian College of Arts & Sciences Washington, D.C

- Master of Science in Statistics (Expect to graduate in May 2020) Boston University, College of Arts & Sciences Boston, MA

- Bachelor of Arts in Mathematics Sep 2014 – May 2018

- Bachelor of Arts in Economics

Data Analytics and Language Competency

Technical Skills: R, Python, SAS, SQL, Excel.

Related Courses: Probability Theories, Applied Linear Models, Data Analysis with SAS, Statistical Data Mining with R, Data Warehousing, Natural Language Processing, Applied Multivariate Analysis, Categorical Data Analysis. Language Skills: Chinese (native), English (fluent). Work Experience

Shanghai Jiaotong University

Summer Research Assistant Shanghai, China (Jun-Jul 2019)

- Researched on visual currency and evaluated reinforcement learning method for price analysis.

- Processed and transformed data using NumPy and Pandas.

- Visualized datasets using Matplotlib.

J&K Investment Holding Group

Summer Intern, Department of Quantitative Investment Shanghai, China (Jul-Sep 2018)

- Worked as an assistance for the full-time employees, extracting news information, data processing and investment strategy researching.

- Cleaned macroeconomic data and transformed data format using Excel and R.

- Applied linear regression data analysis methods to find useful indicators for portfolio building. Projects Experience

Toxic Comments Analysis Nov 2019

- Identified toxic comments and phrase structures using NLP methods.

- Use NLTK, NumPy, Pandas and Spacy for data processing including stop word elimination and sentence tokenization.

- Applied Naïve Bayes classifier to find a list of toxic words from all comments and in addition find frequently appeared phrases based on the word list.

Wrongful Conviction Analysis Apr 2019

- Built statistical models to find potential reasons of wrongful conviction.

- Applied SAS to process the data and built linear and non-linear multi-variate regression models.

- Analyzed the significance of each independent variable by using hypothesis testing. Customer Revenue Analysis Apr 2019

- Built generalized linear models, XGBoosting and gradient boosting machine models using R packages to predict natural log of the sum of all transactions per user. Models were assessed and compared by using root mean square error.

- Visualized data to do feature selection and discover potential relationships variables using R packages.

- Used correlation and PCA to select factors by feature importance and then reduce dimension. Duplicate advertisement March 2019

- Built classification models to identify duplicate ads and evaluate models by checking the AUC scores.

- Applied commonly used classification models using R packages: LDA, QDA, Logistic regression, SVM, Random Forests. Life Expectancy Analysis Nov 2018

- Studied the factors affecting life expectancy and try to predict the life expectancy based on these factors.

- Researched for related variables and selected variables using Ridge regression, Lasso regression, stepwise selections, correlation coefficient, hypothesis tests and etc. using R.

- Developed linear and nonlinear models to predict life expectancy and improved model performance using cross validation, R square, MSE, residual plots, and etc.



Contact this candidate