Xiao (Shawn) Chen
LinkedIn: https://www.linkedin.com/in/xiao-shawn-chen-86511a69/ Portfolio: http://rpubs.com/Cx530548220
Operating system: Linux, macOS, Windows
Machine learning and Deep Learning: Python, PyTorch, PySpark, scikit-learn
Statistical Packages: R, SAS, Excel
Databases: MySQL, SQL, Teradata, Google Big Query
Cloud Platform: Google Cloud Platform
Visualization and Business Intelligence: ggplot2, R markdown, Tableau EDUCATION AND CERTIFICATION
University of Missouri, Columbia, Missouri M.A Statistics Aug. 2015 - May 2017 Anhui Polytechnic University, China B.S Electrical Engineering and Automation Sept. 2010 - July 2014 INTERNSHIP EXPERIENCE
Youzu Interactive Co. Ltd. Shanghai, China
Assistant Game Analyst Jan. 2015 - July 2015
Generating weekly, monthly reports for various business users according to the business requirements. Manipulating/mining data from databases tables (MySQL, R and, Tableau)
Adjusted the start time, duration and activity awards and other game setting to improve user activities. Led to 12% increase in retention and $2 million rises in monthly revenue.
Responsible for performed game analysis on League of Angels, Facebook’s 2015 Best Web Game with
$25 million monthly and $72 million annual revenue in the North America region. Machine Learning and Deep Learning EXPERIENCE
Implement ConvNet by Numpy on Google Cloud Platform Jan.2018 – Present
• Used PyTorch to build VGG-16 architecture for CIFAR-10 image classification with GPU support.
• Applied He initialization, ReLU, bath normalization, drop out regulation and, Adam optimization for model training. Achieved 81.3% accuracy.
Recruit Restaurant Visitor Forecasting in Kallgle.com Dec. 2017 – Feb. 2018
Created interactive data analysis in R by ggolot2 and R markdown.
Clean data, merge dataset and split current variables for features engineering.
Used XGBoost to build Gradient Boosting Tree with 0.514 RMSE and top 15%. Credit Card Fraud Detection Oct. 2017 – Dec.2017
Built Multivariate Gaussian Anomaly Detection system by Numpy for fraud detection.
Used CV to choose epsilon value with F1, Recall and Precision score, applied this epsilon value on train dataset and achieve 0.763 F1 score and achieve 93% accuracy. Prediction of client’s charity behavior Sep. 2017 – Nov. 2017
Fixed skewed data problem by SMOTE oversampling method.
Built random forest regression model for missing value imputation.
Used Logistic Regression model with an L1 penalty for behavior classification and got 0.82 F1, and 0.82 AUC.
Built linear Stochastic Gradient Descent model to predict the amount of donation and got 3.7 MSE. Leaf Classification in Kaggle.com Sep. 2016 - Dec. 2016
Competed in Kaggle.com Leaf Classification competition and ranked top 12%.
Used PCA to reduce data dimensions and keep 98% information.
Normalized data and utilized L2 regulation to avoid overfitting problem.
Built SVM classification model with Gaussian kernel. Used 5-fold Cross Validation method to choose C and gamma, and KNN classification model with a large K to avoid overfitting problem.