Xiang Cao Tel: (518) ***-**** Seeking a position of Data Scientist / Data Analyst / Statistician Page 1 of 1


Bowling Green State University, Bowling Green, OH 2014 - 2016

§ Master of Applied Statistics GPA: 3.85 Specialization: Business Analytics Full Scholarship Johns Hopkins University Nine Course Data Science Specialization, Coursera 2015 - 2016 Nanodegree Program: Machine Learning Engineer, Udacity (Ongoing) 2016 - 2016 Technical Skills:

§ Programming languages: Python, R Data Wrangling/Analysis: SQL, dplyr, Excel

§ Machine Learning: Scikit-learn, Keras, Pandas Data Visualization: ggplot2, Tableau, Shiny

§ Statistics: Experimental Design, A/B test Others: IPython Notebook, R Markdown, Git, Github Machine Learning Experience:

Data Science Competition: Donor Identification, 1st Place Bowling Green State University, 2016

§ Identifying donors and the amount of donation via a fundraising organization dataset (100 k records). Achieved 50% improvement in donor identification compared to random baseline system.

§ Dealing with data imbalance: positive class less than 5%. Performed data re-sampling, including under-sampling, over- sampling (Synthetic Minority Over-Sampling Technique); Cost-sensitive learning.

§ Feature selection: performed feature selection by Lasso and random forest. 142 features representing customer, neighborhood demographics, socio-economic status were weighted and ranked.

§ Model selection: via F-measure and ROC curve, compared the model of Lasso, Ridge Regression, Logistic Regression, Random Forest, KNN etc.; selected optimal re-sampling ratio of each model. Kaggle competition: Allstate Insurance Claims Prediction, Leaderboard Top 5% 2016

§ Predicting the cost and hence severity of insurance claims with encrypted data. Built boosting tree with Xgboost. Built three-layer feed-forward neural network with Keras (Theano backend).

§ Parameters tuning through Bayesian optimization methods. Reduced overfitting via ensemble, combined individual models trained on data subsets.

Data Analyst Intern: Text Classification of Reviews Megaputer Intelligence Inc., 2015

§ Identified the underlying categories of aircraft incidents through text records of Federal Aviation Incident Reports. Extracted and fused textual information with hidden knowledge through taxonomy, seasonal pattern analysis.

§ Improved topic classification through Latent Dirichlet Allocation (LDA). Each report is represented as a distribution of topics; each topic is represented as a distribution of words in the corpus. Statistics Consulting & Business Experience:

Director of Center for Business Analytics Bowling Green State University, 2016

§ Provided statistics consulting service for faculty and students in the university. Helped clients with their tasks in regression analysis, variable selection, ANOVA

§ Case Example: given the dataset of butterfly egg hatching amount and longevity with different treatments and covariates, analyzed which treatment combinations were more significant to the response variables. Marketing Specialist OPPO Mobile Co., Ltd., CN, 2011 - 2013

§ Core member of new product marketing. Developed marketing promotion strategies, including products selling points, retail store display, retail experience enhancement, etc. Successfully promoted OPPO Find5, Finder, R807 to a larger customer base.

§ Group leader of training. Responsible for developing training curriculum for retail agencies, including product knowledge/highlights, sales strategies, customer service, etc. Honor & Awards:

§ 1st Place Award for 2016 Business Analytics Case Competition, Bowling Green State University 2016

§ Metric-based full Scholarship for academic year 2015-2016, Bowling Green State University 2015

§ Metric-based full Scholarship for academic year 2014-2015, Bowling Green State University 2014

