Siman Peng (permanent resident)
* ******** **, ******, ** 92617 Tel: 607-***-**** *****@*******.*** Education
Cornell University, Ithaca, NY August 2013
Master in Biology, Minor in Statistics
Fudan University, Shanghai, China July 2008
Bachelor of Sciences in Biology
Skills
• Programming Languages: R Java (with Spring) Python SQL
• Tools: Hadoop MapR Pig Tableau Git Linux Bash Experience
Data Consulting, part time
Mobilityware, Irvine, CA Aug 2016 – Nov 2016
• Used python to find users play multiple games
• Used Tableau to generate revenue report
Data Scientist Jan 2015 – Nov 2015
Millennial Media (Acquired by AOL), Boston, MA
Millennial Media is an advertising company that places display ads on mobile devices. Millennial Media is the leading mobile ad marketplace. Company's data and technology assets enable advertisers to connect with target audiences at scale.
• Generated graphs for quantiles of one week’s modeling score with python
• Wrote anomaly detection for modeling score with python
• Evaluated eCTR performance of current system at different attribute levels with pig
• Analyzed IP address of sample ids to tell different locations with pig, presented result with Tableau
• Wrote Kolmogorov-Smirnov tests in R to do pairwise comparison for the bid price Data Scientist Jan 2014 – Jun 2014
Digital Roots, Northville, MI
Digital Roots uses cutting edge artificial intelligence (Social AI) technology to manage text data at social networks. Users are left with a concentrated volume of relevant conversations that they can interact with faster and more efficiently. Analyzing Twitter data
• Wrote Naïve Bayesian, using Yarowsky algorithm to find more customers interested in buying product with R
• Used Bagging algorithm and Good Turing estimate to improve precision and recall
• Achieved precision and recall by 61% and 66%
Relevant Courses
• Statistics: Statistical Data Mining Applied Linear Statistical Models via Matrices Multivariate Analysis Theory of Statistics Probability Statistics I Statistics II Quantitative Genetics
• Computer Science: Object-Oriented Programming and Data Structure Introduction to Computing Using Python Mathematical Foundation for the Information Age Bioinformatics Cornell Course Projects
Predicting Human Faces Fall 2013
• To predict human faces given the top halves. Training data of 300 human faces
• Implementd k-nearest neighbor algorithm with R
Picture Quality Analysis (Kaggle Competition) Spring 2012
• To predict whether a picture is good or bad
• Wrote K-means clustering with R over the metric defined by latitude and longitude
• Performed logistic regression within each cluster on other predictors like image size, name, caption
• Drew ROC curve, achieved successful prediction rates of 83%