R // SQL // AWS // Python // Tableau // Pyspark

New York City, NY
October 21, 2020

Xuejun "Yuki" Zhang • 814-***-****


---M.S. in Data Science and Analytics

The Georgetown University, Washington, D.C. GPA: 3.83/4.00 Anticipated Graduation Dec. 2020

---B.S. in Industrial Engineering, Minor in Product Realization Graduated Dec. 2018 The Pennsylvania State University, University Park, PA GPA: 3.40/4.00


R // SQL // AWS // CSS // Hive // HTML // Spark // Linux // Scala // kafka // Impala // Python // Hadoop // Microsoft Azure // Tableau // Machine Learning // Statistical Analysis // Natural Language Processing (NLP)


---NLP Course Teaching Assistant, Georgetown University, Washington, D.C. Aug.2020-Present

• Communicate efficiently between students and professor to ensure the learning materials and assignments are clarified and help solving students coding challenges in their Python projects

---Data Management Volunteer, 1point3acres Website, Remote May.2020-July. 2020

• Troubleshooted the possible daily coronavirus database errors in over 3,000 counties in the U.S on the website

• Compared the coronavirus data climbed from crawlers and from official resources to ensure the data reliability

---Data Analyst, Georgetown Analytics for Non-Profits, Washington, D.C. Sep. 2019-Dec.2019

• Designed and facilitated weekly workshops for the team to help a professor at Smithsonian Institution on a project of decreasing the decline of the Monarch butterfly populations by monitoring butterflies health conditions

• Calculated the correlation between the Monarch butterflies health condition by 8 different butterflies physical characteristics using Python and visualize the analysis result by Tableau

• Forecasted the health condition of Monarch butterflies in the future state to avoid killing over 2,000 live butterflies yearly to monitor butterflies’ health conditions by building predictive models Operations Analyst Intern, Cummins Inc., KY May. 2019-Aug. 2019

• Led the audit by weight project with a goal of reducing the audit time on the dock in Cummins warehouses, shortage and overage claims from customers with the project annualized ROI (Return on Investment) of 37%

• Interpreted the audit time reduction plan by taking data storytelling skills approach to persuade senior management team to approve $50,000 machines purchase to increase the audit process efficiency


---Credit Card Approval Prediction Jan. 2020-May. 2020

• Explored over 30,000 credit cards applications visually by using Tableau before data cleaning and transformation

• Applied forward/backward selection and LASSO to select 14 crucial features for credit card approval process

• Implemented logistic regression and random forest models to predict the credit cards approval status with selected features and tuned the random forest models to reach 85% test accuracy

---Book Recommendation and Classification Jan. 2020-May. 2020

• Web scraped 40,000 book plots and transformed words into matrixes by Word2vec, and bag of words models

• Performed PCA, LSA, LDA algorithms to reduce the words matrix demension for 50-2000 words plots size books

• Applied topic modeling method to classify books in 10 genres and to recommend the similar books by their plots


Graduate Merit-Based Scholarship Recipient, Georgetown University, Washington, D.C. May. 2020

