Python, Java, SQL, Pandas, SkLearn, Matplotlib, Seaborn, Numpy, Scipy

Location:

Worcester, MA

Posted:

September 28, 2017

Contact this candidate

Resume:

Shaowei Gong

508-***-**** *******@*****.*** ** Somerset St. Apt 2, Worcester MA 01609

OBJECTIVE: looking for Data Scien st posi on

SUMMERY:

Master degree in Data Science, with background of Sta s cs. More than two years’ experience in the field of analysis, solid knowledge in Machine Learning, Sta s cs, Data Mining, and Data Visualiza on. EDUCATION

Worcester Polytechnic Ins tute (WPI), Worcester, MA Master of Science in Data Science, GPA 3.7/4.0, May 2017 Southwestern University of Finance and Economics (SWUFE), Chengdu, China Bachelor of Science in Sta s cs, GPA: 3.7/5.0, July 2015 SKILLS

Programming/Scrip ng Languages: Python, Java, SQL

Frameworks/Libraries and Tools: Pandas, SkLearn, Matplotlib, Seaborn, Numpy, Scipy.Stats, Spark, D3, Git, AWS WORK EXPERIENCE

Data Scien st Intern, Dana-Farber Cancer Ins tute, Boston US Jan. 2017- Apr. 2017 U lizing data from start to finish, includes data preprocess, hypothesis test, text mining, data visualiza on.

• Preprocessed raw data and finish exploratory data analysis using Pandas, seaborn, matplotlib library.

• Conducted hypothesis tests to measure the effect of treatment based on Numpy and Scipy library.

• Implemented text mining for pa ent records based on unsupervised learning(LDA, Word2vec, seman cs analy- sis) using Sklearn, Gensim library.

Data and BI Engineer Intern, Houghton Mifflin Harcourt Learning Technology, Boston US May 2016– Aug. 2016 Implemented a data visualiza on dashboard to track the real- me metrics of the products.

• Extended the database schema and finished ORM using Hibernate Framework.

• Built the data extrac on modules to extract real- me data of web servers.

• Built RESTful APIs using Jersey that provided access of clean data for external users.

• Implemented the data visualiza on of products metrics data with D3.js library. PROJECTS

Allstate Claims Severity Predic on Mar. 2017 May 2017 Predicted the cost of claims based on users’ historical data.

• Cleaned data and extracted feature using Pandas library.

• Implemented classifica on models for accident predic on and achieve 87% f1_score accuracy.

• Trained regression model to predict the cost of claim and achieve 70% R_Squared accuracy. Forest Type Predic on Jan. 2017 Feb. 2017

Predicted the forest type based on spectrum data collected from satellites.

• Preprocessed raw data, including missing value imputa on, outlier detec on, scaling.

• Compressed 81% variance from 57 features into 4 principal components using PCA.

• Implemented and tuned the classifica on the model that achieves 89% predic on accuracy. Data Visualiza on Dashboard for New York Restaurants Hygiene Condi ons Oct.2016 Dec. 2016 Finish interac ve data visualiza on for exploring restaurants hygiene condi ons in New York City.

• Preprocessed raw data, extracted aggregated sta s cs from clean data, and transformed shape of data.

• Designed the interac ve data visualiza on and dashboard webpage, and implemented them with D3.js. K-Means Clustering based on Spark Sep. 2016 Oct. 2016 Implement K-Means clustering algorithms based on Spark framework.

• The Spark implementa on of K-Means algorithm enable cluster Terabyte level massive dataset.

• Improve the computa on efficiency by 95% comparing to single node implementa on of K-Means clustering algorithm for massive dataset.

Contact this candidate