Programming Languages: Python SQL Matlab Data Science Packages: P

Location:

Jersey City, NJ

Posted:

March 27, 2020

Contact this candidate

Resume:

Zhimin (Florence) Sun

412-***-**** *****.****@*****.*** Jersey City, NJ 07305

Legally Authorized to Work in the U.S.

Technical Skills

PROGRAMMING LANGUAGES: Python SQL MATLAB

DATA SCIENCE PACKAGES: Pandas, NumPy, SciPy, Scikit-Learn, Requests, Beautiful Soup, matplotlib, plotly, bokeh, Flask, Pyspark, TensorFlow

MACHINE LEARNING: Linear Regression, Logistic Regression, Decision Tree, Random Forest, K-means, SVM, Gradient Descent, Time Series Forecasting, Web Scraping, Feature Engineering

Education

Ph.D. in Mechanical Engineering University of Pittsburgh August 2019 M.S. in Materials Science and Engineering Xi’an Jiaotong University July 2013

B.S. in Materials Physics Xi’an Jiaotong University July 2010

Data Science Projects

NYC motor vehicle collision forecast

• Collected, cleaned and structured real-world data from NYC OpenData website.

• Developed time series forecasting models to get insights from data and predict the collision numbers; Assessed the model performance with R2 score about 0.5 for the whole NYC.

• Automated data updates weekly, allowing the web app to use historic data from any day in the last week to predict collision numbers for the five boroughs in NYC one week later.

• Derived insights by Wordcloud about the collision reasons which are “inattention distraction”, “following closely” and “failure to yield”.

• Deployed the real-time web app on Heroku cloud application platform by using Flask, available to public at

https://nyc-collision-forecast.herokuapp.com/

Celebrity social network analysis

• Analyzed the social network of celebrities by web scraping more than 100,000 photo captions from the archived websites of New York Social Diary.

• Used regular expression, Beautiful Soup and spaCy to accurately parse names from photo captions.

• Built a graph model and determined the most influential celebrity by PageRank.

Experience

THE DATA INCUBATOR January 2020 – March 2020

Fellow

• Used Python libraries and SQL to gather, clean, organize and analyze messy real-world data.

• Wrote complex SQL queries to extract information from a NYC database of restaurant inspections, revealing common types of violations.

• Developed a machine learning model by designing custom estimators and transformers, and built an ensemble model combining several smaller models to achieve better performance.

• Parsed, cleaned and processed a 10 GB set of XML files of user actions on a Q&A website; trained a word2vec model and a classification model on tags associated with questions; implemented a machine learning pipeline using Spark ML.

UNIVERSITY OF PITTSBURGH September 2014 – August 2019

Doctoral Researcher

• Designed a new electromechanical actuated refrigerator to satisfy special cooling requirements in electronics and mechanics industry.

• Proposed a novel mathematical model to analyze and optimize device performance; Analyzed different groups of materials data and discovered the most important properties impacting the performance of the device.

• Taught several undergraduate lab classes and maintained lab equipment at the university for 4 years.

• Peer reviewed more than 10 manuscripts for top scientific journals, and wrote 4 papers for international conference and journals.

Achievements

• International Student Representative, January 2018-December 2018: Planned and executed professional seminars by inviting alumni to discuss their work and industry insights to over 100 student attendees.

• Fellowship (2019); Student Honoree at PITT Honors Convocation (2019); Outstanding Reviewer Certificate (2017)

Contact this candidate