Data Scientist

Location:

New York, NY

Salary:

80k

Posted:

March 01, 2017

Contact this candidate

Resume:

Alex Yuan Li

New York City, NY ***** Cell: 435-***-****

****.*****@*****.*** linkedin.com/in/alex-yuan-li www.github.com/yali107 EDUCATION

M.S. in Mechanical Engineering, Columbia University – New York, NY Feb 2016 Related Coursework: Statistical Machine Learning, Data Mining, EDA and Visualization B.S. in Chemical Engineering, University of Utah – Salt Lake City, UT May 2012 SKILLS AND TOOLS

Machine Learning Algorithms: SVM, Ensemble Learning (AdaBoost, Gradient Boosting, Random Forest), PCA, k-Nearest Neighbor, Expectation-Maximization/k-means, Neural Networks, Natural Language Processing

Languages & Packages: Python (scikit-learn, pandas, Scrapy, Flask), R (ggplot2, dplyr, caret, Shiny), MatLab

Database Tools and Frameworks: Spark, Hadoop, MapReduce, AWS, SQL (MySQL, PostgreSQL) Engineering Software: SolidWorks, LabView, MiniTab PROFESSIONAL EXPERIENCE

Data Science Fellow, NYC Data Science Academy – New York, NY Sep 2016 – Jan 2017 A 12-week intensive Data Science fellowship that demonstrated Data Science disciplines including machine learning, Python and R development, big data, deep learning, and data visualization

Ninkasi: Created Beer Recommender System in Python Flask web framework o Web scraped and preprocessed over 280,000 reviews and ratings from popular beer site ratebeer.com o Implemented content-based NLP model from user reviews using TF-IDF and LSI to find similar beer o Developed collaborative filtering models using TensorFlow that reached 92% prediction accuracy

Allstate Claims Severity: Scored in the top 9% in the Kaggle competition using a stacked model of multiple gradient boosting trees, feedforward neural networks, and random forest

TV Show Analysis: Web scraped show rating using Python Scrapy from three online databases and analyzed differences between each scoring system using statistical techniques including Principle Component Analysis (PCA) and Analysis of Variance (ANOVA)

PokeViz: Developed a prediction and visualization application for mobile game Pokémon Go using R and Shiny; o Implemented a k-NN algorithm to help players predict the rarity of the Pokémon at any location o Created and managed multiple relational databases using SQL as training and testing data set. Research Assistant, Inner Mongolia University – China Jun 2015 – Sep 2016 Conducted research in Bioinformatics, incorporated machine learning methodologies to explain biological problems

Implemented Support Vector Machine (SVM) classification using LIBSVM library to classify bidirectional promoter vs. coding gene/intergenic region of prokaryotic DNA sequence and achieved 92.3% cross validation accuracy

Created Python package for generating different feature vector modes for pseudo-DNA sequences which incorporated physicochemical properties and sequence-ordering effects

Preprocessed about 80,000 microarray gene expressions and calculated fold changes using R Research & Development Engineer, Becton Dickinson – Sandy, UT Jan 2013 – Jul 2014 Focused on the development of new lubrication process for intravenous catheter medical products.

Analyzed large quantities of experimental data on daily basis using statistical methods (ANOVA, Hypothesis Testing) via MiniTab software for new design verification and process validation OTHER RELATED PROJECT EXPERIENCE

Machine Learning Projects Mar 2015

Performed PCA in R to find principle components from over 2,000 facial image data and reconstruct the original facial image from these principle components

Classified handwritten digit data obtained from USPS database with 93.4% accuracy in R by using both SVM and AdaBoost algorithms with decision stump classifiers as weak learners

Contact this candidate