Zhimin (Florence) Sun
412-***-**** *****.****@*****.*** Jersey City, NJ 07305
Legally Authorized to Work in the U.S.
Technical Skills
PROGRAMMING LANGUAGES: Python SQL MATLAB
DATA SCIENCE PACKAGES: Pandas, NumPy, SciPy, Scikit-Learn, Requests, Beautiful Soup, matplotlib, plotly, bokeh, Flask, Pyspark, TensorFlow
MACHINE LEARNING: Linear Regression, Logistic Regression, Decision Tree, Random Forest, K-means, SVM, Gradient Descent, Time Series Forecasting, Web Scraping, Feature Engineering
Education
Ph.D. in Mechanical Engineering University of Pittsburgh August 2019 M.S. in Materials Science and Engineering Xi’an Jiaotong University July 2013
B.S. in Materials Physics Xi’an Jiaotong University July 2010
Data Science Projects
NYC motor vehicle collision forecast
• Collected, cleaned and structured real-world data from NYC OpenData website.
• Developed time series forecasting models to get insights from data and predict the collision numbers; Assessed the model performance with R2 score about 0.5 for the whole NYC.
• Automated data updates weekly, allowing the web app to use historic data from any day in the last week to predict collision numbers for the five boroughs in NYC one week later.
• Derived insights by Wordcloud about the collision reasons which are “inattention distraction”, “following closely” and “failure to yield”.
• Deployed the real-time web app on Heroku cloud application platform by using Flask, available to public at
https://nyc-collision-forecast.herokuapp.com/
Celebrity social network analysis
• Analyzed the social network of celebrities by web scraping more than 100,000 photo captions from the archived websites of New York Social Diary.
• Used regular expression, Beautiful Soup and spaCy to accurately parse names from photo captions.
• Built a graph model and determined the most influential celebrity by PageRank.
Experience
THE DATA INCUBATOR January 2020 – March 2020
Fellow
• Used Python libraries and SQL to gather, clean, organize and analyze messy real-world data.
• Wrote complex SQL queries to extract information from a NYC database of restaurant inspections, revealing common types of violations.
• Developed a machine learning model by designing custom estimators and transformers, and built an ensemble model combining several smaller models to achieve better performance.
• Parsed, cleaned and processed a 10 GB set of XML files of user actions on a Q&A website; trained a word2vec model and a classification model on tags associated with questions; implemented a machine learning pipeline using Spark ML.
UNIVERSITY OF PITTSBURGH September 2014 – August 2019
Doctoral Researcher
• Designed a new electromechanical actuated refrigerator to satisfy special cooling requirements in electronics and mechanics industry.
• Proposed a novel mathematical model to analyze and optimize device performance; Analyzed different groups of materials data and discovered the most important properties impacting the performance of the device.
• Taught several undergraduate lab classes and maintained lab equipment at the university for 4 years.
• Peer reviewed more than 10 manuscripts for top scientific journals, and wrote 4 papers for international conference and journals.
Achievements
• International Student Representative, January 2018-December 2018: Planned and executed professional seminars by inviting alumni to discuss their work and industry insights to over 100 student attendees.
• Fellowship (2019); Student Honoree at PITT Honors Convocation (2019); Outstanding Reviewer Certificate (2017)