Sign in

Data Analyst

Milpitas, CA, 95035
January 08, 2020

Contact this candidate


Yuhong (Vicky) Zhai

669-***-**** *** E Weddell Drive, Sunnyvale, CA 94089 SUMMARY

• Proficient in R, Python (Scikit-Learn, NumPy, Pandas, Seaborn), SQL, Advanced Excel (Pivot Table), SAS, VBA, Google Analytics, Tableau

• 2 years working experience in data manipulation, hypothesis testing, regression analysis, predictive modeling, machine learning, econometrics

• A hardworking, self-starting, multi-tasking and detail-oriented problem solver who embraces opportunities and challenges; enjoy fast-paced work environment and collaborating with people from different teams EDUCATION

Texas A&M University, College Station, TX 08/2017 - 05/2019 Master of Science, Quantitative Economics and Econometric (STEM-Designated), GPA: 3.55 /4.0 Coursework: Analyzing Big Data I (SQL and Data Visualization), Analyzing Big Data II (Machine Learning), Time Series Forecasting in Finance Southwestern University of Financial and Economics, Chengdu, China 09/2013 - 07/2017 Bachelor of Economics

Coursework: Micro and Macro-economics, Statistics Inference, Linear Algebra, Probability Theory, Econometrics (Python), Risk Management WORK EXPERIENCE

The Galindo Group, Bryan, TX Data Analyst Intern 05/2018 - 12/2018

• Performed Exploratory Data Analysis and built Multiple Linear Regression predicting housing price given size, unit type, lease, unit number and customer information using Python, generating actionable insight for the marketing team

• Explored algorithm to extract features from raw data and calculated key metrics such as monthly occupancy rate to analyze seasonality trends

• Evaluated model performance and reported accuracy with MSE and Adjusted R squared, reduced prediction error by 3% using regularized regression Texas A&M University, College Station, TX Research Assistant 09/2019 - Present

• Utilized Python to build linear, quadratic trend, exponential smoothing, and ARIMA model based on CPI monthly data

• Used ACF and DW test to analyze multi-variable and residual’s auto-correlation, and ADF test for residual’s unit root

• Made one-step forward forecast and multi-interval forecast for each model, and used DB function to test accuracy

• Evaluated forecast accuracy in training set and test set and turned out ARIMA was the best model with least Root Mean Square Error Baidu Eats, Beijing, China SQL Analyst 12/2015 - 03/2016

• Manipulated SQL data and calculated key metrics by using Windows Function, and Common Table Expressions

• Constructed SQL query to calculate monthly active users, retention rate and growth rate to measure business performance IBM China Consultant, Chengdu, China Business Analyst 08/2016 - 01/2017

• Predicted which previously purchased products will be in a user’s next order using data on customer history orders.

• Created new features for user-level, product-level, order-level for random forest model

• Optimized logistic product in each prediction with F1-Score expectation maximization using Grid search with cross-validation to tune parameters China Life Asset Management Company LTD, Hohhot, China Quantitative Analyst 11/2014 - 03/2015

• Improved and maintained the whole workflow of updates of financial products daily price data on company websites and automated daily validation process of risk model and presented the sorted findings in meeting with senior management

• Simulated Monte-Carlo in Python for value-at-risk to predict initial margins for trade positions, achieved a much more accurate portfolio market exposure prediction


Credit Card Default Modeling 10/2019 - 12/2019

• Performed Exploratory Analysis, data transformations (scaling and missing value imputation), feature selections and data visualization

• Developed machine learning models including Logistic Regression and Random Forest to predict the credit default, produced an accuracy of 83.7%

• Evaluated model performance via cross-validation and used grid search to tune hyperparameters which improved prediction accuracy by 3% Web Traffic Forecasting 11/2019 - 12/2019

• Forecasted future web traffic time series for approximately 145,000 Wikipedia articles from past 26 months data.

• The forecast is made using medians of slide windows of data and adjusted for the month over month increase and weekend vs weekday difference.

• Performed meaningful presentation and explained insights of data to the non-technical audience based on customer reviews CERTIFICATES

SAS Certificate Base Programmer for SAS 9

FRM Level 1 Candidate

Contact this candidate