Ruoxuan (Tia) Tian
Houston, TX ***** 612-***-**** GitHub: https://github.com/tianruoxuan ***********@*****.*** Summary
• Energetic, result-driven analytical thinker with expertise in Data Analysis, Machine Learning, Data Manipulation / Visualization, Forecasting, Reporting, Project Management, A/B testing, Google Analytics.
• 3 + years hands-on experience in Predictive Modeling, Clustering, digital analytics.
• 4+ years strong programming skills in R, SAS, Python, SQL, Tableau, VBA, Hive, Microsoft Office.
• Strong communicator and team player.
Education
Rice University, Houston, TX Aug. 2015-Dec. 2016
M.S. in Statistics
University of Minnesota – Twin Cities, Minneapolis, MN Sep.2010-May 2014 B.S. in Mathematics, Department of Mathematics Scholarship Udacity Machine Learning Engineer Nanodegree Oct.2017-Jan.2018 Working & Research Experience
Findream LLC Nov. 2017-Current
Data Analyst, Houston, TX
• Conducted exploratory data analysis and built predictive modeling (e.g. Random Forest) to predict conversion propensity of customers.
• Visualized the correlation of different KPIs (conversion rate etc.) and reported to client’s senior leadership team.
• Performed segmentation analysis based on consumer preferences to deliver verifiable and actionable client recommendation on customer service strategy.
• Utilized Microsoft Excel and SQL to compile, validate, query and manipulate data compliance with corporate standards.
MD Anderson Cancer Center June 2016-Sep.2017
Research Analyst Intern, Houston, TX
• Conducted data collection, cleaning, and analysis utilizing various statistical methods (regression models, ANOVA, hypothesis testing, etc.); quantified and visualized data to generate dashboards and reports by R/Shiny or Markdown.
• Leveraged MDA 2016 a pediatric sarcoma cohort clinical source to test our algorithm LFSPRO (Li-Fraumeni syndrome), and improved AUC to 0.85.
• Co-built an R package in testing, correcting and wrapping an algorithm about a Bayesian variable selection method ( https://cran.r-project.org/web/packages/BayesS5/index.html )
• Utilized HTML and CSS to create a website for research Lab ( http://odin.mdacc.tmc.edu/~wwang7/index.html ) New York City Taxi Trip Duration Oct.- Nov.2017
• Performed exploratory data analysis to understand NYC Taxi Trip duration data.
• Created features and utilized PCA, K-means clustering to group attributes and reduced them whenever necessary.
• Built predictive modeling (feed-forward neural network, Random Forest, Lightgbm) to predict NYC Taxi trip duration and tune the hyperparameter to further improved model performance(MSE) with grid search. Kaggle Competition - Neural Speech Decoding Sept. -Dec. 2016
• Utilized multiple ways to add non-linearity noise on ECoG recordings as predictors and reduce the multicollinearity on power spectrum as response, to reconstruct audible human speech.
• Partitioned data into several groups with similar power spectrum based on breakpoints and sentence contents, used PCA to reduce dimension and study trend structure.
• Built regression models (Ridge, Lasso, Elastic Net) on each group and performed cross-validation on these partitions to choose the best ensemble model with tuning parameters.
• Combined elastic net regression with addition of non-linearity as segmented regression, and assigned them optimized weight, of which result reached rank No.1 on the public leaderboard.