Xuehan (Cathy) Liu
515-***-**** ******@********.***
*** **** ***** **, *** York, NY, 10025
SUMMARY
• A quick learner with analytical mindset who thrives in both team and individual environments with the ability to understand and solve data-driven problems as well as confidently articulate results and concepts
• Hands-on experience in data extraction and manipulation, statistical modeling and machine learning using SQL, Python, Azure Web Services, R/R Studio, SAS, Power BI, and Tableau EDUCATION
Columbia University, Graduate School of Arts and Sciences New York, NY Master of Arts in Statistics (GPA: 3.2) 09/2016-12/2017
• Relevant courses: Statistical/Advanced Machine Learning, Linear Regression, Stochastic Process Iowa State University, School of Liberal Arts and Sciences Ames, IA Bachelor of Sciences in Statistics and Mathematics (GPA:3.85) 08/2012-05/2016
• Honors: Graduated Magna Cum Laude within University Honors Program, Dean’s List
• Relevant courses: Statistics/Statistical Modeling, Time Series, Linear Algebra, Money, Banking and Financial Institutions PROFESSIONAL EXPERIENCE
VNB Consulting Services Edison, NJ
Data Scientist 02/2018 – 02/2018
• Scraped customer text reviews from HTML pages and prepared them for sentiment analysis including creating labels based on reviews’ rating and cleaning up text with Python (BeautifulSoup, Numpy, Pandas, etc.)
• Conducted sentiment analysis, in R and Azure for a retail client to understand its customers’ satisfaction to products, and best model logistic regression achieved 85%+ accuracy
• Connected to Facebook API to extract client’s social media data, turned it into actionable insights and presented in a Power BI dashboard for client to develop marketing campaign strategy
• Developed a logistic regression model based on the product data to predict the likelihood of customers’ returning the product and achieved 80%+ accuracy in both training and validation datasets
• Initiated a use-case research after discovering client’s interest on social media analytics application, and communicated the results to the client with a summary report
Data Scientist Intern 11/2017 – 02/2018
• Extracted ~200 streaming tweets daily using Python and loaded unstructured data into Azure Blob Storage
• Created new data assets by adding unstructured social media text data, which greatly increased the richness of the input variables fed into the sales/revenue predicting model and thus improved the accuracy of the prediction PROJECT
Paintings Auction Price Prediction on Sothebys.com 04/2017
• Developed price prediction scheme including data extraction, feature engineering, model training and validation
• Extracted 20 features from structured and unstructured data, including auction price, genre, description and HoG image features from ~3000 paintings in Python and R Studio
• Predicted auction winning prices with regular linear regression in R, and improved the performance by introducing Lasso regression which brought 55 percentage points lift in accuracy Image Classification on Poodle and Fired Chicken 03/2017
• Extracted HoG features for classification from ~2000 images in R Studio
• Developed and compared 10 models including GBM, SVM, Random Forest, Adaboost, and XGBoost with SIFT and HoG features, and best and most efficient model HoG + SVM shows consistent error rate of 8% in training and cross validation LEADERSHIP
Director of Career Development, Columbia Statistics Club 09/2016 – 12/2017
• Led event planning for Hackathons (over 200 participants in total) and coordinated a team of 10 for communication with sponsors, registration follow-up, food and room reservation etc.
• Supervised 5 teammates to gather interview questions, reserve rooms and manage registration tickets for mock interviews