Xiang Gao
SKILLS
Programming: Python (pandas, scikit-learn, numpy, NLTK, spaCy, Tensorflow), SQL, R, Spark, HTML Skills: Regression, Classification, Clustering, NLP, Forecasting, Recommendation System, MapReduce, Data Wrangling, Web Scraping, Git, AWS, GCP, Linux EDUCATION
University of Illinois Urbana-Champaign (UIUC) Sep 2017- Dec 2018 M.S. Computer Science GPA 3.91/4.0
Text Information System, Applied Machine Learning, Cloud Computing Application, Advanced Bayesian Modeling, Data Visualization, Practical Statistical Learning Lehigh University Jun 2010 - Sep 2012
M.S. Mechanical Engineering and Mechanics
EXPERIENCES
The Data Incubator (TDI) Fellow Sep - Nov 2018
• Applied data wrangling, machine learning and distributed computing to massive datasets including scraping websites Wikipedia, StackOverflow, Yelp and New York Social Diary
• Built an App using Quandl API, flask, Heroku, Bokeh to visualize stock price and trends Embraco Whirlpool Industrial Engineer Jun 2015 - Aug 2017
• Created data visualization using Tableau, Excel, matplotlib, seaborn on weekly quality data MCS Industries Industrial Engineer May 2013 - May 2015
• Built logistic regression to forecast new products defect using Python and reduced cost by 18% PROJECTS
Loan Default Prediction UIUC Sep - Dec 2018
• Merged data from multiple sources, performed feature engineering and used LightGBM as baseline
• Built a pipeline using Imputer, Scaler, LDA and XGBoost and randomly searched parameter range
• Used GridSearch to find optimal parameters and achieved 24% higher accuracy Walmart Sales Forecast UIUC Sep - Dec 2018
• Applied linear regression and local regression to forecast each store&dept weekly sales NYC social graph TDI Sep - Nov 2018
• Applied beautifulsoup, json to scrape data (1k webpages) and ran graph analysis using networkx StackOverflow Analysis TDI Sep - Nov 2018
• Parsed data (10GB) using lxml and PySpark to analyze posts, users, votes and tags statistics
• Built a multi-classification model to predict the tags of a question and achieved 92% accuracy Games App Ranking Website TDI Sep - Nov 2018
• Built a Website that shows the hotness of games app using Twitter API and topic modeling Fake Review Detection System UIUC Jan – May 2018
• Built a k-means model to cluster reviews and found patterns of fake reviews on Hotel.com data
• Added user information to training process and improved the model accuracy by 11% Image Classification with Tensorflow UIUC Jan -May 2018
• Built neural networks on CIFAR data using Tensorflow on Colab platform with 90% accuracy Data Visualization UIUC Sep - Dec 2017
• Created interactive visualization App using D3, JavaScript, Heroku to help user select wines ********@*****.*** 331-***-****
Github:xgao0412 EAD Available