Post Job Free

Resume

Sign in

Python, R, SAS, SQL, Tableau

Location:
Arlington, VA
Posted:
October 28, 2020

Contact this candidate

Resume:

Lianjie Shan

202-***-**** adhdfh@r.postjobfree.com 1600 South Eads Street, Apt 315S, Arlington, VA EDUCATION

George Washington University M.S. in Statistics Jan 2019 - Jan 2021 Courses: Data Mining, Machine Learning, Advanced Time Series GPA: 3.89 / 4.0 Washington, D.C. Northeast Electric Power University B.S. in Information and Computing Science Sep 2010 - Jun 2014 Second Degree: Bachelor’s Degree in Engineering Changchun, China SKILLS

Programming: Python (Pandas, Numpy, Sklearn, Scipy), SQL, R, SAS, Tableau Data Analytics: Experiment Design, Hypothesis Testing, A/B Testing, Analysis of Variance and Regression PROJECTS

Research of Car Insurance Plan Apr 2020 – May 2020

• Solved the problem of asymmetric data distribution by using the method of step-by-step processing

(Classification & Prediction) in a logical order

• Removed the outliers and used Decision Tree method to process the feature selection step and picked up top5 variable (Age, Bluebook, etc.) to build the model

• Applied 4 classifiers (SVM, Logistic, LDA, QDA) with Python to make a classification and selected LDA as the final classifier because of the highest accuracy (76%)

• Applied MLP to build a predictive model after adjusting the number of neurons and the epochs; Used MSE cost function to verify the feasibility of the predictive model with an acceptable accuracy (MSE is equal to 0.03)

Sulfur Dioxide Prediction Model Based on Linear Regression Jan 2019 – May 2019

• Forecasted air pollution levels (SO2) based on the local characteristics (Temperature, Days with precipitation, etc.) for environmental companies

• Processed the data diagnosis step including residual analysis and Cook’s D method to check the outliers and high leverage point; Selected the multivariate linear regression to find the best model

• Checked the Pearson correlation coefficient to find the existence of collinearity and transformed variables to solve the problem of heteroscedasticity

• Applied stepwise selection and backward elimination methods in SAS to build a multivariate linear regression model about the SO2 prediction with satisfying R square (0.64) Avito Duplicate Ads Detection May 2019 – Aug 2019

• Developed a predictive model that can automatically detect the duplicate Ads based on 4 million pairs of Ads

• Found out the potential features and generated new variables (Distance, Same Location, etc.) based on the raw dataset after combining the dataset and deleting NA values in R

• Applied 4 classification methods (Xgboost, Random forest, etc.) to test the data and picked up Xgboost classifier as final selection with the highest AUC source (0.7624) WORK EXPERIENCE

Jilin Electric Power Co., Ltd. Baicheng Power Generation Co. Sep 2014 – Nov 2017 Electrical Engineer Jilin, China

• Engaged in data statistics of factory-used 10kV energy and analysis of the loss rate of power consumption

• Conducted statistics of plants generating capacity and on-grid energy; Collected and monitored environmental protection data

• Maintained electrical systems and devices; Tested electronics for problems and proposed solutions



Contact this candidate