TIANJING CAI
Cupertino CA 206-***-**** *****@**********.***
EDUCATION
MS, Data Analytics, Georgetown University, Washington DC Aug 2017 – May 2019 BS, Mathematics, University of Washington, Seattle, Washington Aug 2013 – Aug 2017 TECHNICAL & LANGUAGE SKILLS
Languages: Python, R, SQL, MATLAB, Pig Latin, Hive, Java, Excel, Racket, Ruby Software and platform: R-Studio, Tableau, R Shiny, Greenplum, Hadoop, Spark, Numpy, Keras, Tensorflow, SQL SERVER
PROFESSIONAL EXPERIENCE
Statistical Intern, comScore Inc. Reston, VA May 2018 – Aug 2018 Viewability threshold research
• Conducted research regarding raising current desktop user viewability threshold from 0.5 to 1.0.
• Extracted and aggregated weekly viewability data, 3 billion records, 1TB, from Hadoop using Apache Pig. Visualized impact of viewability thresholds on frequency and reach on R.
• Authored high-level summary report for Media Rating Council. Parameter tuning on Snowman Demo Assignment model
• Extracted weekly demographic and gender-age data from Hadoop using Apache Pig and filtered data on Greenplum using SQL.
• Tuned parameter on SDA model with parallel programming using R’s foreach package.
• Achieved average of 5% increase on accuracy and 500 increase on coverage over production level parameter for top 5 parameter combinations.
• Presented impact of each parameter on accuracy and coverage to production team. Optimizing UV model using neural network
• Built 2-layer regression neural network model with 5-fold cross validation on Python to predict monthly hits per machine on IOS system for each website.
• Performed feature engineering, normalization and explored different transformation methods on response variable (log, Box- Cox) due to skewness.
• Built model with tuned dropout rate, customized loss function, step decay learning rate using GridSearchCV.
• Achieved 10% increase on prediction accuracy over original Gradient Boost Machine model.
• Documented procedures and findings on company’s Wiki page and managed all code on Jira. Digital Marketing Analyst Intern, naisA Global, Washington DC Feb 2018 – May 2018
• Linked Google tag manager on company webpages to collect user information and applied random forest model to predict user conversion rated.
• Achieved 95% of accuracy of prediction of conversion on testing dataset.
• Generated report to marketing manager based on model insight; achieved 10% of increase on monthly impression and 5% increase on conversion rate.
SELECT PROJECTS
Georgetown University, Washington, DC
Trends of popular music Sept 2018 - Dec 2018
• Analysed success of recent 10 years of Billboard top100 songs. Used APIs to collect data from Billboard, Twitter and Spotify. Predicted sentiment score on Twitter data and songs’ lyrics using convolutional neural network model.
• Visualized characteristics of popular song from artists, lyrics and soundtrack aspects using Tableau, ggplot, seaborn, leaflet, Network D3 and R shiny.
• Performed regression test to examine relationship between emotions in lyrics and related tweets.
• Created webpages using HTML, CSS and JavaScript that embed all visualizations to present analysis and conclusions. Twitter Sentiment Tracking for Stock Prediction Sept 2017 - Dec 2017
• Analysed customers’ reaction toward Apple announcement on Twitter and Nasdaq stock price.
• Collected both Twitter and stock data using APIs. Performed word2vec transformation for each tweet, built convolutional neural network to predict sentiment.
• Performed exploratory data analysis using cluster analysis and association rule on stock data.
• Merged daily twitter sentiment score with stock data; predicted next day stock trend using random forest and achieve highest average accuracy at 87% on 10-fold cross validation.