Yichi (Sylvia) Zhang
Address: Greenbelt, MD ***** Phone:812-***-**** *****.*****@*******.***.***
EDUCATION
University of Maryland, R. H. Smith School of Business, College Park, MD Dec 2018 Master of Science in Business Analytics; GPA: 3.56/4.00
Selected Courses: Database Management(A); Data Mining and Predictive Analytics(A);Big Data and AI(A);Data Processing in Python
Indiana University, College of Arts & Sciences, Bloomington, IN May 2016
Bachelor of Science in Apparel Merchandising (Fashion Marketing)
Honors: Leah Weidman Scholarship, Phi Sigma Theta National Honor Society, Executive Dean’s List Minor(s): Marketing, German
TECHNICAL SKILLS
Languages: Python, R, SQL, SAS, Unix, JavaScript, HTML5
Big Data: Spark, Pig, Hive, Hadoop, MapReduce, AWS (S3, EC2, EMR, Redshift, Aurora, VPC)
Statistics and Modeling: Linear Regression, Logistic Regression, Random Forest, PCA, Clustering, XGBoost, Time Series, Hypothesis Testing, A/B Testing, Natural Language Processing, Latent Semantic Analysis, Image Processing
Tool and others: Agile methodology, Scrum, CRISP-DM, Alteryx, Tableau, Power BI, Excel (Advanced), Google Analytics
EXPERIENCE
Strategy & Performance Analysis Intern II AARP, Washington DC Jan 2018–Dec 2018
Lead membership analysis team prediction projects using machine learning models; Conduct data ETL and visualization reports for Social Engagement Community (SEC), Advocacy, and Multicultural teams and archive versions on SharePoint
Collected and transformed raw data (230K rows) from state branch into national office’s database and conducted data preprocess using Python NumPy and Pandas packages, including accuracy verification, imputing missing value, text format standardization
Extracted customer email information (15,000,000 rows) for analytics team from Redshift using SAS queries
Predicted membership acquisition response rate with direct mailing channel for 3Q18 - FY19 using Time Series model for analyst team to plan marketing acquisition budget
Predicted events return rates with Linear Discriminant Regression and Random Forest using Python scikit-learn based on events’ (100,000 events, 30 features) historical data and achieved 86.4% accuracy, and drafted events promotion strategy reports for marketing team
Raised relevancy score of results returned from company internal search by 62% by creating guidance of step-by-step tagging and keyword search lists
Data Scientist Associate CITIC Co. Ltd, San Jose, CA Sep 2016–Aug 2017
Trained 1,000 local store managers and associates in 220 stores in China for data logging to improve data quality
Cleaned sales and transaction data using SQL queries for over 600,000 raw data collected from off-line and on-line channel
Implemented a “customer shopping footprint” dashboard based on over 450,000 customers’ profile and purchase behavior with Navicat Premium (SQL database), Tableau, HTML5 as the prototype for engineering team, iterated functions the initial Java-based platform based on feedback of users
Drafted reports for analysis of typical PERSONA for off-line customers and presented to stakeholders
Operation Analyst Intern ELLASSAY Fashion Co. Ltd, Shenzhen, China Jul 2014–Aug 2014
Collected sales performance data of 239,000 SKUs including current sales, stock turnover, and customer satisfaction
Optimized the inventory distribution plan based the data above to maximize total revenue
Summarized the Fashion Trend for 2015 F/W by collecting statistics of the clothes’ materials, colors, patterns, silhouettes and themes and reported to the director of design department
SELECTED PROJECTS
Text Analytics with Principal Financial: to build AI machine detecting the presence of whether texts scraped from third party resources are about government policy, trade restriction, lawsuit, intellectual property, or trademark, to provide reports for financial analysts:
Labeled parsed text in four weeks to set up training data and received over 2000 useful text
Created data pipeline and experimented on traditional ML methods with numerical variables— Logistic Regression, Naive Byes, SVM, Random Forest, Gradient Boosting — Random Forest returns highest accuracy at 82%
Deployed TF-IDF on text data and created bag-of-words features, transformed the matrix into new logistic regression
Applied Ensemble Method passing weight from Random Forest and from Logistic Regression derived from TF-IDF to build stronger model, and achieved highest accuracy at 88%
Increased highest accuracy from 76.02% with only features given by client to 88% after feature engineering and ensemble method
Recommended client to use model with new features created, to drop unimportant features, and to create more diverse tagging