Sign in

Data Customer Service

Temple City, California, United States
January 29, 2018

Contact this candidate



Data Scientist


University of New Haven / Galvanize, San Francisco

Data Science, Master

University of California, Irvine

Economics/Mathematics, BA


Data Science Intern, Scientific Revenue, San Francisco, CA Aug 2017 – Dec 2017

Led a project using Neural Networks to build a Customer Lifetime Value prediction model with real-world gaming data(Terabytes)

Performed model optimization on various models including customer segmentation and churn classification by grid-search and cross-validation

Provided detailed analysis reports with clear visualizations on customers’ behaviors based on their monetary value and conversion rate in hourly basis

Operation Team Administrator, Panda Restaurant Group, Rosemead, CA Aug 2016 – Nov 2016

Assisted with fulfillment of store inventory and maintained adequate inventory levels by placing orders to ensure sufficient inventory is available to the stores

Responsible for updating the inventory data and managing the e-commerce software to ensure an accurate status of the inventory

provided customer service related to shipping and inventory issues

Real Estate Analyst, Sunny Valley LLC, Alhambra, CA

July 2015 -- June 2016

Analyzed market trends of properties in assigned areas by compiling data into financial models for investment evaluation by senior management

Prepared forecast and variance analysis on a weekly basis

In charge with lease preparation and resolving any leasing issues with landlords/proper managers


Best movie gross predictor Feb 2017 -

Find the best movie gross predictor by fitting various statistic models on the IMDB Movie Dataset.

-Exploratory data analysis(EDA), data cleaning, and feature engineering

-Used Seaborn and Matplotlib to show the feature correlation and the distribution of the sample data

-Identified the most important features by comparing the significance level of each features using an OLS model

Technologies: Pandas, Numpy, Matplotlib, Statsmodels, Scipy, Seaborn

Meetup Rsvp May 2017 -

Used various technologies to build a data pipeline to consume RSVP data from the Meetup API, then applied machine learning algorithm to the data.

-Built a data pipeline by using Websocket to stream from Meetup API and store into AWS S3

-Performed real-time analysis using Kefka Spark streaming

-Classified event by its description using Multinomial Naïve Bayes algorithm

Technologies: Pandas, Numpy, Matplotlib, Sklearn, Pyspark, NLTK, Websocket, AWS, Kefka

Photo Interestingness July 2017 -

Trained a CNN model over 5000 movie screenshots to determine the interesting level of the content as it shows.

-Balanced the dataset by using SMOTE and re-weighting the classes

-Used three different pre-trained Convolutional Neural Network (CNN) models (VGG19, ResNet50, InceptionV3) to compare the results

Technologies: Pandas, Numpy, Matplotlib, Sklearn, Keras, SMOTE

Amazon reviews classification Sep 2017 -

Used Natural Language Processing to determine whether a product has a positive rating or negative rating based on its user reviews.

-Applied regex to clean the text data

-Used TFIDF to vectorize(unigram and bigram) over 1.6 million user views

-Built a Logistic Regression model and a SGD model both with over 80% accuracy

Technologies: Pandas, Numpy, Matplotlib, Sklearn, NLTK, Seaborn

Customer Lifetime Value Prediction Oct 2017

Built a Customer Lifetime Value prediction model with Multilayer Perceptron.

-Trained an Autoencoder to perform unsupervised feature engineering

-Applied different sampling methods (SMOTE, Oversampling) on the training data for model comparison

-Built a joint Multi-perceptron model by training a classification model and a regression model which outperformed the Random Forest and XGB models

Technologies: Pandas, Numpy, Matplotlib, Sklearn, Redshift SQL, Keras, SMOTE, XGBoost, Scipy

Gym Crowdedness prediction Dec 2017 -

Used Time Series to predict the total number of people in the next hour in a local gym.

- Feature engineering, plotting the distribution of the headcount hourly and weekly, defining the correlation between each feature

-Built an ARIMA model using the historical data which outperformed the Random Forest model with a RMSE of 8.8

Technologies: Pandas, Numpy, Matplotlib, Sklearn, Scipy, Seaborn, Statsmodels

Contact this candidate