Yurui Sheng
Bellevue, WA ***** +1-509-***-**** *****.*****@*******.*** https://www.linkedin.com/in/yurui-sheng/ EDUCATION
Washington State University Pullman, WA
Master of Science in Mathematics (Data Analytics) GPA: 3.6/4.0 Jan 2018 – Dec 2019
• Courses: Linear Optimization, Applied Linear Models, Stat Methods in Research, Numerical Analysis, Time Series, Simulation Methods, Financial Modeling, Asset Pricing in Financial Engineering, Quantitative Risk Management etc. East China University of Science and Technology Shanghai, China Bachelor of Science in Safety Engineering (Environment Engineering) Sep 2010 – Jun 2014
• Awards: Second-class Academic Scholarship (Award to top 10% students in Dept. Resource and Environment at ECUST); First-class Scholarship for Social Work (Award to top 3 students in Dept. Resource and Environment at ECUST) TECHNICAL SKILLS
Machine Learning: Classification (Logistic Regression) Statistical Methods: Time series, Regression Models, Hypothesis test Software and Programming Languages: Python (pandas, numpy, scikit-learn), R, SQL, SAS, LaTeX Visualization: Tableau, R(ggplot2)
WORK EXPERIENCE
Washington State University Pullman, WA
Teaching Assistant and Tutor Jan 2019 – Dec 2019
• Graded homework and exams for two courses – Numerical Analysis and Probability and Statistics
• Provided tutoring for undergraduates, helping with up to 20 students per week with their Math/Stat homework TE Connectivity (NYSE: TEL, Revenue: ~14 billion USD, Number of Employee: ~80,000) Shanghai, China Data Analyst (Organization Development & Learning Team) May 2016 – Apr 2017
• Contributed to reduce 13% annual training cost by combining similar training classes and negotiating with vendor partners
• Led TE Continuous Improvement Program, reduced 200 minutes working time per week by developing training data system and simplifying operation process, and increased 5% employee satisfaction in 6 months by improving facility/handbook quality and providing continuous tracking resources
• Designed and implemented a database using SQL to store training vendor data for effective sourcing and tracking
• Prepared annual training cost and ROI analyses report and budget forecast by PowerPoint and Tableau to executive audience
• Provided data-driven strategic planning advice and visualized report in Tableau to manager stakeholders based on training needs/cost/trend analysis
SELECTED PROJECTS
Yelp Business Analysis Simulation in SQL Jan 2020 - Feb 2020
• Cloned a database in SQL to store ~7GB yelp business data consisting of 6 tables indicating business information, working hours, and users’ reviews and tips records
• Analyzed records showing customers who have never left tips, who have left more tips than reviews, or whose reviews are never marked as helpful/funny by other users to learn customers’ behavior while using Yelp
• Achieved the ranking of top 20 most reviewed restaurants in different areas and their average stars Classification of Cancer Cells Mar 2019 - Apr 2019
• Preprocessed ~5GB data including checking missing values and 0 values in R Studio
• Selected turning parameter lambda via Cross Validation and built Logistic Regression Models under 3 regularization techniques – L1, L2, and Elastic Net Regularization
• Selected the Logistic Regression Model under L2 Regularization with best predictive performance (with largest AUC ~0.91, lowest total probability of misclassification~0.14, highest sensitivity~0.95, highest specificity~0.79) by comparing ROC curve and confusion matrix
Analyzing House Sales in King County, USA with R Jan 2019 - Feb 2019
• Processed house sales data from Kaggle to compare prices and views of houses with or without waterfront
• Constructed two hypothesis tests to compare prices and views of houses with or without waterfront, and performed Chi- Square Test and T-test, coming up with p-value < 2.2e-16, indicating that waterfront could be a crucial attribute for sales
• Proposed three Linear Regression Models, selected the one with higher R-Squared value ~70%, and implemented prediction for future house prices
• Visualized data by outputting boxplots for each attribute, histogram for price distribution, and bar plots for median price based on different attributes by using ggplot2
Financial Modeling for Walmart Jan 2019 – Feb 2019
• Proposed financial planning assumptions by using linear regression and Monte Carlo simulation based on 2009-2018 Walmart annual reports
• Constructed Pro Forma Financial Statement in spreadsheet for the next five years
• Obtained the weight average cost of capital (WACC) and intrinsic value by 5-year free cash flow valuation