YUJIE HU
Bethlehem, PA ************@*****.*** 484-***-****
EDUCATION
Lehigh University Bethlehem, PA
Master of Science in Statistics, Mathematics May 2022
GPA: 3.74/4.0 Dean’s List
Relevant Coursework: Time Series, Statistical Machine Learning, Linear Models in Statistics
Ocean University of China Qingdao, China
Bachelor of Applied Mathematics Jun 2020
GPA: 3.27/4.0
SKILLS
Data Analytics: Statistics, Data Modeling, Data Automation, Data Visualization, A/B Testing
Machine Learning Models: Multiple Linear Regression, Logistic Regression, LDA, Nonparametric Regression (Local Linear, Nadaraya Watson, Spline), PCA, KNN
Technical Skills: Python (pandas, numpy, matplotlib, statsmodels, sklearn, seaborn), R (olsrr, car, glmnet, gam, dplyr, PerformanceAnalytics, ggplot2, tidyr, splines, ISLR, MASS), SQL (where, group by, having, order by, count, max, sum), SAS (base, stat, sgraph, macro, sql, iml, iml+), SPSS, MATLAB
PROFESSIONAL EXPERIENCE
Jiangxi ISUZU Motors Co., Ltd. Jiangxi, China
Data Analyst Intern Jan 2019 – Aug 2019
●Collected inventory and sales data from different sources and built databases by producing reusable and scalable scripts in SQL and Python, improving efficiency by 60%.
●Built insightful dashboards and reports for senior management to forecast, track and measure stock level, sales, cost, savings, and profit.
●Participated in monthly business process improvement meetings with cross-functional teams, and reduced reporting time by 45% through revamping existing excel reports.
Nanchang Investigation Team of National Bureau of Statistics Jiangxi, China
Data Analyst Intern Jul 2018 – Sept 2018
●Researched local malls and food trucks to gather sales data.
●Transformed research data into standard formats using Python, reducing manual intervention by 70%.
●Analyzed research data and estimated correct tax amounts. Provided insights and drafted a 10-page report for the management level.
PROJECT EXPERIENCE
Study of People’s Happiness Based on Multiple Regression Model Jan 2022 – Apr 2022
●Performed data cleaning, transformation, data quality check, and exploratory data analysis on data of 149 countries in Python and SQL.
●Identified the key factors that affected people's happiness and implemented a linear regression model in Python to determine the relationship between the evaluation index and key metrics.
●Tuned parameters and compared model performance using repeated cross-validation, achieving 92% accuracy.
Future Sales Prediction Oct 2021 – Dec 2021
●Preprocessed and analyzed 2M+ sales data of 200K products in R to identify key factors, established feature engineering, and constructed correlation features.
●Discovered product categories with similar sales patterns by clustering the correlation vectors and trained for each category in stepwise selection and lasso regression to refine the model's variables.
●Built a logistic regression model in R to predict the sales of 200K products for the next 3 months.
●Optimized model using LOOCV with high RMSE at 0.91 and reached 89.8% accuracy.