Sign in

Data Information

Irvine, CA
June 29, 2018

Contact this candidate


Xuexuan (Bill) Hu

** ******** ***, ******, ** ***12 949-***-****


University of California, Irvine, Irvine, CA

Donald Bren School of Information and Computer Sciences, B.S. in Data Science SKILLS

• Programming and design skills with Python (scikit-learn, numpy, scipy, pandas, gensim), JAVA, HTML/CSS/JS, and C++

• Proficient in machine learning: classification, regression, clustering, feature engineering

• Expert in SQL: database (MySQL, PostgresSQL, MongoDB, AsterixDB), data management and analysis

• Data collection and analysis in R: time series, regression models, hypothesis testing and confidence/credible intervals, principal component analysis and dimensionality reduction

• Website ( design and maintenance

• Familiar with assembly language and Tableau


Database for Education department March 2017 – Jun 2018 University of California, Irvine Irvine, CA

• Design E-R diagram and data structure based on excel files (8 courses and 1109 students)

• Clean up the data sets (duplicated data, null values, incorrect information, unstructured data) and data wrangling (sparse table vs key-value table)

• Created a system could read excel files and build a database in PostgreSQL with updating functionality using python

• Analyzed data for students with different grades and home languages Database for Yelp February 2017 – March 2017

University of California, Irvine Irvine, CA

• Created database for Yelp from JASON file (274 thousand records)

• Transferred customers’ rating from numerical value to categorical value (from number of stars to good rating or bad rating) and predicted their rating based on their comments with different models (logistic regression, TF-IDF, knn) and stop word frequencies (0.1% and 0.01%)

• Made a sentiment analysis for customers review comments and found the words with the 10 most positive and negative weights (most positive: perfection 2.716 and most negative: horrible -2.283) Database for IMDB January 2017 – February 2017

University of California, Irvine Irvine, CA

• Extracted data from IMDB (1.7 million movies and 6.8 million people) and stored the data in PostgreSQL

• Connected PostgreSQL database with python notebook and manipulated the data

• Visualized the data with plot and analyzed the trend of movie industry across time (the population reach its heyday around 1980 but encounter a recession after 2000)

Rainfall Prediction April 2016 – June 2016

University of California, Irvine Irvine, CA

• Built machine learning models to predict whether there is a rainfall at a location based on processed infrared satellite image information

• Ensembled two prediction models with weighted learners: mltools ensemble (decision tree, neural network, random forest, linear classifier) and sklearn ensemble (logistic, knn, decision tree, neural network, random forest, gradient boost, ada boost)

• Improved validation AUC from 0.61 to 0.78 and Kaggle score from 0.58 to 0.78 Tableau Data Visualization April 2016 – May 2016

University of California, Irvine Irvine, CA

• Developed skills of making interactive work sheets and dashboards using built-in data set within Tableau

• Created various layout by different devices (i.e. tablet, phone, laptop)

• Merged work sheets and created interactivity with working specification

Contact this candidate