Sign in

Data Analysis

Washington, District of Columbia, United States
April 17, 2018

Contact this candidate


Zheng Lyu

**** *. **** ****** *********, VA ***** 202-***-****


THE GEORGE WASHINGTON UNIVERSITY, Washington, D.C. Master of Science in Statistics, Expected 05/2018

GPA: 3.70/4


Bachelor of Economics in Economic Statistics, 06/2016 GPA: 3.75/4


• Programing languages: R, Python, SQL;

• Data Visualization: R with ggplot2, Excel (pivot table/plot), Tableau, QGIS;

• Data cleaning: Excel, R.


Analysis of PM2.5 in China 09/2017-12/2017

• Built Linear Regression Model to find influential factors of PM2.5;

• Used Time Series Analysis to find the trend and seasonal component of PM2.5;

• Used Cross Validation to choose best model among SARIMA, ARIMA, Holt-Winters models to predict the PM2.5 in the future.

Analysis of Shots of Kobe 10/2017-12/2017

• Used LASSO, Forward and Backward Stepwise Selection to determine the variables that will be used in the following model;

• Conduct Cluster Analysis to divide the data into different number of clusters to see if there is any pattern in the dataset;

• Built different kinds of models including Logistic Regression (with interaction term, with spline), LASSO Regression, Ridge Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Support Vector Machine with different kernels, Decision Tree, Random Forest, Naive Bayes to compare the CV (cross-validation) Error Rate and AUC (area under the curve) and find the best model;

• Found Kobe is not good at shooting from the left side of the rim and in the last 3-4 minutes of a game, defend is more important than offence. WORK EXPERIENCE

GWU Hospital 10/2017-12/2017

Statistic Consultant

• Built Logistic Regression to find out influential factors of BAC (breast artery calcification);

• Found cut point of BAC that can identify patients with over 30% CAS (coronary artery stenosis) by using cluster analysis and Linear Discriminant Analysis. Internet Education Foundation 01/2018-now

Data Analytics

• Used Excel to clean dataset and calculated basic statistics for the App Challenge;

• Used QGIS to make maps showing the distribution of participants all around the US;

• Used Logistic Regression to find factors that affect the submission result of teams.

Contact this candidate