SHENGWANG (ARTHUR) ZHANG
Los Angeles, CA 805-***-**** ********@***.***
EDUCATION
University of Southern California (USC) Expected Graduation: 05/2022 Master of Science in Applied Data Science
University of California, Santa Barbara (UCSB) 09/2016 – 03/2020 Bachelor of Science in Statistics and Data Science SKILLS & COURSEWORK
Relevant Coursework: Data mining; Machine Learning; Foundations of Data Management; Data Science at Scale; Regression Analysis; SAS Base Program; Data Science Computing
Data Analyzing Skills: SQL, Python, R, Spark, Hadoop, Pandas, Tableau, Machine Learning, Database Manipulation WORK EXPERIENCE
Quality Assurance Data Analyst Intern 09/2020 – 11/2020 Broadstreet COVID-19 Data Project Los Angeles, CA
Collaborated with the data entry team to accurately record the daily number of COVID-19 cases on Google Sheet
Handled over 400 data points by fixing absurd downtrend of confirmed cases to ensure the integrity of the dataset
Established Linear Regression models to predict future number of cases and compared them with real-world datasets Data Analyst Intern 07/2019 - 09/2019
Shaanxi Help You Electronic Technology Co. Ltd Xi’an, China
Implemented Hadoop and Spark clusters on Docker Compose to shorten 13% of data computing time
Analyzed DAU, Engagement, and Elevator Running Time metrics to gain actionable insights for the ads campaign
Cooperated and presented data analysis conclusion with the marketing team to increase 20% of the acquisition rate Operation Specialist 04/2020 – 06/2020
Ezeeship Los Angeles, CA
Acquired 10 registered customers and increased sales by 20% in two weeks to further enhance company awareness
Initialized 3 demos introducing our new system features to existing customers and increased 10% of the conversion rate
Developed the email campaign to promote our service using mail meteor to gain 20 new customers PROJECTS
Big Data Analysis of YouTube Videos – Data Management 09/2020 - 11/2020
Managed datasets from various databases to evaluate the most popular videos under diverse attributes on AWS instance
Preprocessed and cleaned over 60,000 videos and channels data including missing values and variable standardization using Pyspark to shorten 20% of data operating time
Aggregated datasets stored in MySQL and Firebase databases into an integrate YouTube dataset to the User Interface Binary Prediction of NBA Player’s shot outcome – Machine Learning 09/2020 - 11/2020
Built Logistic Regression, Decision Tree, and XGBOOST binary models to best predict Steph Curry’s shot outcome
Utilized k-fold Cross-Validation to measure and compare the mean accuracy rate of each model and then chose the Logistic Regression model with the highest accuracy rate of 68%
Executed the final model to players in all different positions to ensure the comprehensiveness of our model on Spark Analysis of International Airline Passengers – Time Series 01/2019 - 03/2019
Developed a Time Series model to predict that the International passengers will grow about 20% in the next six seasons
Launched normality checking by plotting Histogram and Q-Q Plot of the final model and performing Shapiro Wilk Test
Performed Ljung-Box Test to testify the independence and drew ACF and PACF plots to attest the constant variance 2016 Election Analysis – Machine Learning 10/2019 - 12/2019
Created county and state-level data visualizations to gain insights into party preferences of different states and counties
Applied Principal Component Analysis to reduce the data dimensionality and concluded that poverty and income per capita are the most crucial features contributing to the result of the Election
Implemented the hierarchical clustering algorithm to determine what is the ideal number of clusters to group counties