Peiyao Zhu
206-***-**** *******@*****.*** linkedin.com/in/peiyao-zhu Los Angeles, CA
EDUCATION
University of Southern California Los Angeles, CA Dec 2024 Master of Science in Applied Data Science GPA: 3.75/4
● Courses: Experimental Design, Natural Language Processing, Deep Learning, Big Data, Data Mining, Database Systems University of Washington Seattle, WA June 2022
Bachelor of Science in Applied Mathematics Minor: Data Science, Informatics GPA: 3.71/4
● Courses: Data Analysis, Data Modeling, Data Structures and Algorithms, Machine Learning, Predictive Analytics, Statistics SKILLS
Programming: Python, SQL, MATLAB, R, Scala, Java
Libraries & Frameworks: Pandas, NumPy, PySpark, NLTK, statsmodels, BeautifulSoup, Scrapy
Data Science: Data Cleaning, Data Mining, Data Modeling, Time Series, Quantitative Analysis, Causal Inference Databases: PostgreSQL, MySQL, NoSQL (MongoDB)
Version Control: Git
Machine Learning: Scikit-Learn, Keras, PyTorch, TensorFlow, OpenCV, Unsupervised/Supervised Learning, Regression, Classification, Clustering, Natural Language Processing Big Data & Cloud: Hadoop, Spark, AWS (Redshift, SageMaker DynamoDB, EC2, S3), GCP (Big Query, Cloud Storage) Tools: Tableau, Power BI, AWS QuickSight, Excel, D3.js, Matplotlib, Seaborn, ggplot2, Google Analytics
PROFESSIONAL EXPERIENCE
Amazon Seattle, WA
Business Intelligence Engineer Intern May 2024 - Aug 2024
● Drove a 30% reduction in brand infringement by delivering an interactive deep dive analytic dashboard using AWS QuickSight that 50+ users within Brand Protection team utilized for KPIs monitoring and abuse risk management
● Reduced manual effort by 18 hours per week by building 10+ ETL pipelines leveraging DataNet, consolidating raw seller-related data from 30+ tables into 250+ KPIs for streamlined analysis and reporting
● Wrangled 15TB+ seller-related data stored in AWS Redshift cloud data warehouse using advanced SQL queries to extract detail-level infringement metrics, accelerating root cause analysis by 40%
● Analyzed seller infringement historical data by applying statistical data analysis and data visualizations to identify risk trends in catalog abuse for recommendation in actionable risk management strategies
● Collaborated with cross-functional team and stakeholders to align data initiatives with business objectives and conduct quality assurance to ensure dashboard data matched previous Excel calculations
● Presented business reports to multi-level leadership and non-technical stakeholders; wrote documentation detailing data pipeline design, ETL processes, and dashboard functionality, facilitating cross-team communication China Guangfa Bank Foshan, CN
Data Scientist Intern Sept 2020 - Nov 2020
● Saved over $70K in annual marketing budget by analyzing and visualizing 100M user-related data surrounding historical in-app events via Python for actionable raffle event marketing strategies
● Boosted operational efficiency by 70% by automating reporting processes with Power BI dashboards that visualized conversion rates and user retention patterns for feature development insights
● Enhanced user satisfaction rate by 15% by delivering a clustering model using K-Means to segment users based on 100M historical behavioral data to enhance personalization of marketing recommendations
● Cooperated cross-functionally with product managers, mobile developers, and marketing teams to implement data collection protocols for in-depth user behavior analysis, resulting in more targeted product improvements PROJECT EXPERIENCE
EyeLabs.AI, Opticalc: AI-Driven IOL Power Prediction Machine Learning, Supervised Learning, Predictive Model
● Saved surgical expenses by 50% by developing a lens power predictive model (Linear Neural Network) utilizing Pytorch, achieving an R-squared of 0.64 and improving upon conventional methods by 14%
● Performed exploratory data analysis on 1,000+ patient records using Python (NumPy, Pandas) to clean and analyze data for modeling; applied MICE imputation for missing values and IQR capping for outliers to ensure high-quality model inputs Yelp User Business Reviews Recommendation System Python, Data Mining, Collaborative Filtering, Spark, MapReduce
● Executed data cleaning and data mining on 1.3M+ business, review, and user data via Python; conducted feature engineering through distributed computing with Pyspark to create 30+ features (e.g., location, delivery options) for model training
● Devised a hybrid recommendation system to predict user-business rating pairs by implementing item-based collaborative filtering with Pearson similarity and a tree-based model (XGboost) in Python, achieving an RMSE of 0.96 eBay Product Trends Analysis R, Shiny, Data Analysis, Data Visualization
● Developed an R Shiny application with regional sales analysis, displaying top-performing products by region and monitoring trending search terms, resulting in actionable market intelligence for inventory planning
● Conducted data analysis and data visualizations on 300K+ marketplace data via dplyr and ggplot2 to identify price anomalies, seasonal trends, and regional purchasing patterns