Di Zhu
*** *** ******, ****, *******, NJ *****
*****@*******.***
OBJECTIVE: To obtain an internship or a full-time job as Business Analyst or Data Scientist
EDUCATION: Stevens Institute of Technology, Hoboken, NJ
Master of Science Business Intelligence & Analytics Expected 05/2016
Hefei University of Technology, Hefei, China
Bachelor of Engineering Chemical Engineering 06/2010
EXPERIENCE: Stevens Institute of Technology, Hoboken, NJ 10/2015 – Present
Graduate School Student Assistant
Helped industry students with Hadoop Pig & Hive exercises in Stevens Institute of Technology’s Hadoop Bootcamp
Graded homework for the Optimization and Process Analytics course
Davidson Lab, Hoboken, NJ 05/2015 – 08/2015
Research Assistant, Data Analyst
Extracted and processed the U.S. Northeast Urban Ocean Observatory data set by Python
Clustered observation stations into 3 different groups and set up different time series machine learning model within groups to predict the U.S. northeast ocean storm surges
Improved the astronomical water level forecasting model by applying machine learning technology in Python via scikit-learning
China Petroleum & Chemical Corporation, China 06/2013 – 08/2013
Summer Internship, Data Analyst
Extraction Transformation Loading (ETL) and batch retail data via SAP
SKILLS: Certifications: SAS Base Programmer, Essential Bloomberg
Programming & Statistics: Python, R, SQL, SAS, Hadoop Pig & Hive
Languages: English(fluent), Mandarin Chinese(native)
PROJECTS: Decision Support System
Built a decision support system to help MultiMagazine Inc. selecting marketing targets by Python
Normalized the categorical variables and filled in the missing values
Randomly split the data set using 70% and 30% of the observations for the training and testing data set respectively
Fitted the training set by Logistic Regression, CART, Support Vector Machine, and Naïve Bayes algorithms
Evaluated the 4 different algorithms by ROC curves, Precision-Recall curves, Error Rate, and Confusion Matrix, and picked the most suitable one based on the company’s Benefit/Cost marketing matrix
Developed a user friendly environment to run this predictive model
Boston House Value Analysis
Reduced the dimension of the Boston housing dataset with Principle Components Analysis in R and SAS
Identified 5 of the 14 variables that best explained the houses price using AIC subset regression
Performed multivariate regression to examine the relationship between the houses price and the set of selected variables
TV Reviews Analysis
Collected more than 4000 TV set reviews from 4 different websites using Python script
Performed a text mining process on the reviews
Developed a decision tree to select useful reviews based on rating, length and the reviews tags