Quan Yuan
San Jose, CA ***** *****@*****.*** 734-***-**** GitHub
EDUCATION
University of
Michigan
MS in Entrepreneurship
GPA: 3.71 (Top 10% of
Business graduate students)
Tsinghua University
BS in Industrial
Engineering
BS in Economics
2010 - 2014 Beijing, China
Outstanding Practicum (top
1 out of 60)
GPA: 3.84
COURSEWORK
• Machine Learning
• Data Manipulation
• Explorative Data
Analysis
• Information Retrieval
• Data Structure &
Algorithm Design
• Operations Research
• Probabilistic User
Behavior Modeling
• Database Design
• Experimental Design
• Applied Statistics &
Probability
• Management
Information Systems
• Simulation
• Econometrics
• Game Theory
SKILLS
Languages
Python • Java • R •
C/C++ • MATLAB •
MySQL • PHP •
HTML/CSS • JavaScript
Tools
Numpy • Scipy • Pandas •
Scikit-learn • Theano •
NLTK • Spark •
Matplotlib • CakePHP
WORK EXPERIENCE
Data Scientist (Summer Intern)
June 2013 to August 2013 Foxconn Technology Group - Yantai, China
• Defined Task Complexities Metrics; built Linear Regression models with Lasso or Ridge Regularization to predict Ramping-up Capacity of Production Lines; estimated coefficient intervals via Bootstrapping, achieving a 16.2% increase in accuracy.
• Applied Genetic Algorithm to select the best match between work stations and workers regarding task complexities; implemented the Online Forecast System and a simple personnel management system. Product Manager (Summer Intern)
May 2015 to August 2015 YONO Health Inc. - Sunnyvale, CA
• Created the low fidelity prototype for the mobile app using and managed the graphic designer and the programmer on mobile app development.
• Managed the hardware development progress of the outsourced design firm.
• Created usability test protocol under FDA guidance; also built function testing plan and marketing testing survey.
ACADEMIC PROJECTS
Statistical Analysis on Taxis in New York City
School of Information, University of Michigan - Ann Arbor, MI
• Cleaned the NYC taxis operations data in 2013 (about 300 million rows) using PySpark and visualized taxis’ distribution based on the location data converted from Bing Map API.
• Predicted Tip Amount by stacked models of GLM, MARS, P-splines Regression and Gradient Boosting Trees with L1 post processing, tuned models by AIC and/or GCV and evaluated and visualized effects of variables.
• Implemented ISLE (Importance Sampled Learning Ensembles) based on Sklearn. Statistical Quality Analysis of Tea Polyphenols (TP) in Tea Drinks Data Analytics for Quality Excellence (DATE) Lab, Tsinghua University – Beijing, China
• Collected and cleaned the raw profile data (spectrum curves), reduced its dimensions from 1751 to 30 using discrete wavelet analysis and denoised the data using soft thresholding method.
• Applied SVM model to identify different brands of tea drinks based on the cleaned data; tuned model parameters by cross validation.
• Evaluated the most important features by minimizing Generalized Classification Error using Genetic Algorithm.
Sentiment Analysis on Tweets
School of Information, University of Michigan - Ann Arbor, MI
• Cleaned 1.6 million tweets (tokenization and Named Entities replacement) in dataset Sentiment140 using MRJob.
• Implemented statistical language model for tweets classification; Tuned the model parameters by cross validation and reached the accurate rate of 94% on 16K sample data.
• In another dataset, Twitter US Airline, built a similar model and identified words with top predictive contributions.
• Implemented Discrete Hidden Markov Model for POS tagging and sped up its performance by Cython. Otto Group Product Classification
• Trained and tuned multiple Machine Learning models (KNN, Logistic Regression, Ensembles, SVM, etc.) for stacking. Trained a Deep Belief Network with the output of these ML models as input.
• Implemented a Deep Neural Networks Module on top of Theano.