Post Job Free

Resume

Sign in

Data Scientist

Location:
Cupertino, CA
Posted:
August 24, 2017

Contact this candidate

Resume:

Evelyn Peng

408-***-**** • ac1zzv@r.postjobfree.com • evelynpeng.site

evelyn-peng evelynpeng • Cupertino, CA, 95014

Work Experience

Home.ai Belmont, CA

Data Science Intern 10/2016-06/2017

• Developed production-ready solution to identify frequently visited places based on mobile/GPS location data using DBScan and Gaussian Mixture clustering algorithms.

• Used Logistic Regression to predict users’ next destination given a departure place and time and improved accuracy of existing algorithm by 30%.

• Built machine learning models (Neural Network) that use time-series and environmental data to predict the optimal states and actions of home devices in order to anticipate users’ needs in an autonomous home.

Hewlett-Packard Taipei, Taiwan

Supply Chain Analyst and Product Coordinator 7/2014-6/2016

• Implemented a VBA (Visual Basic for Applications) algorithm to increase operational efficiency and accuracy, which saved the company four person-hours per day by reducing errors in data entry. Nominated for Director’s Quarterly Award by team manager due to exceptional performance.

• Collaborated with regional and global business unit planning teams to facilitate data collection and performance measurement. Completed report to support cost team’s request for budget forecast in two weeks.

Boston University Boston, MA

Research Assistant for Professor Kehinde F. Ajayi 2/2013-9/2013

• Performed exploratory data analysis and implemented data cleaning pipeline for the Republic of Ghana Project with a focus on secondary school admissions. Used STATA to analyze data with regression models and two-way tables.

Projects

Data Acquisition Recommendation system, A/B test

SFpal - http://sfpal.life/ 05/2017

• Built a recommendation system to help users find communities and neighborhoods that best fit their needs using collaborative filtering algorithm and Jaccard similarity. Collected and scraped data on property details, socioeconomic information, and police

reports from Zillow API, City Data, and SF Police Department.

• Implemented multi-armed bandit experiments (A/B testing) on Google analytics (certificated) to investigate the effectiveness of home page image on user conversion rate.

Data Visualization R Shiny

Airbnb SF dataset 05/2017

• Visualized Airbnb dataset with 8,706 listings in San Francisco from 2008-2017 using ggplot and R shiny with user interactive Geo-scatter plot, Sankey diagram and Parallel coordinates.

Distributed computing Distributed Computing, Classification

Cuisine type classification 01/2017

• Used Spark ML (decision tree) and Spark SQL to predict cuisine type from a list of ingredients with distributed computing using Amazon AWS’s EMR and S3. Achieved accuracy rate of 50.52% among 20 cuisine types.

Skills

Programming: Python(Numpy, Pandas, Scikit-learn, Theano, keras), R, Shell script, STATA, VBA

Distributed Computing: Apache Spark(Spark.SQL, Spark.ML), Databricks

Database: PostgreSQL, MongoDB, Cassandra

Visualization: Matplotlib, R Shiny, Plotly, ggplot

Tool: Amazon Web Service(EC2, S3, EMR), Git, Google Analytics, Google AdWords, LATEX, Microsoft Office

Education

University of San Francisco San Francisco, CA

M.S. in Analytics, Awarded Jun. 2017 7/2016-6/2017

• Relevant coursework:Linear Regression, Machine Learning, Distributed Computing (Spark), Time Series Analysis, SQL & NoSQL, Natural Language Processing, Experimental Design.

Boston University Boston, MA

M.A. in Economics, Awarded Jan. 2014 9/2012-1/2014

National Taiwan University Taipei, Taiwan

B.A. in Economics, Awarded May 2012 9/2007-6/2011



Contact this candidate