Sign in

Data Analysis Machine Learning Technology

Durham, North Carolina, United States
January 09, 2018

Contact this candidate


Mengrui Yin

Phone: 206-***-**** Email:


Duke University Durham, NC

Ø Master’s in Statistical Science 08/2016 – 05/2018 Ø Coursework: Machine Learning, Predictive Modeling, Advanced Stochastic Modeling, R, Statistical Computation (Python), Bayesian Methods and Modern Statistics Processing: Categorical Data, Database System (AWS,SCALA, NoSQL) University of Washington Seattle, WA

Bachelor of Art, Major in Mathematics and Economics 09/2012 – 06/2016 Ø Coursework: Stochastic Calculus for Option Pricing, Computational Finance and Financial Econometrics, Econometric Theory and Practice


Analysis Skills: Machine Learning Algorithm Big Data Queries and Interpretation Predictive Modeling and Model Checking Hypothesis Testing Technological Skills: Python, Database, SQL, R studio, Stata, LaTeX, Microsoft Excel Tools: Jupyter Notebook, Git, Linux


Data Scientist, Inspur Information technology company, China 06/2017 - 08/2017 Ø Extracted 2015 traffic violation data from dataset and processed with Python, including using API to convert data and using matplotlib and Heatmaps for data visualization Ø Applied Birch, K-prototype clustering and used foilum and gmaps in Python to visualize cluster on map

Ø Created a web to show all results and proposed suggestions for traffic management to managers Business Analyst, MAN Truck & Bus, China 01/2015 - 02/2015 Ø Set up Excel PivotTable to present and analyze the data of quality assurance costs of trucks PERSONAL PROJECTS

Implementation of LDA with Python 04/2017 -05/2017 Ø Implemented LDA using Gibbs Sampling and using Cython to speed up the algorithm Ø Applied algorithm to news from Associated Press, getting 10 top words for each topic and inferring topic of Associated Press

Ø Implemented LDA using EM algorithm

Ø Comparing Gibbs Sampling version with EM version in perplexity and efficiency Prediction and Model Selection 02/2017

Ø Processed data with characteristics of house, including data imputation and transformation Ø Proposed models for price prediction and calculated RMSE for each model on training data Ø Selected Lasso model as final model and used diagnostic plots for model checking Ø Applied model on test data to predict price and proposed suggestions for housing investment Modeling Categorical Data 01/2017

Ø Converted categorical response data, treatment of lung cancer, to 0 and 1 and converted design matrix correspondingly

Ø Applied Probit regression on converted data in both Bayesian and Frequentist way NBA Tracking Data Analysis 11/2016 -12/2016

Ø Scraped data of basketball competition from website and used dplyr to clean data Ø Used Shiny App to show the path of selected player of any given play on the basketball court and show the histogram of FG% score for the player

Contact this candidate