Post Job Free
Sign in

Data Scientist

Location:
San Francisco, CA
Posted:
June 08, 2018

Contact this candidate

Resume:

ZIZHEN (AMBER) SONG

SAN FRANCISCO, CA ***** 415-***-**** *****.**********@*****.*** www.linkedin.com/in/songzizhen/ EDUCATION

University Of San Francisco June 2017 – June 2018

Master of Science in Data Science

• Machine Learning, NoSQL, Relational Databases, Data Acquisition, Time Series Analysis, Big Data Strategy, Data Visualization, Computational Statistics, Distributed Computing, Deep Learning Hong Kong Baptist University September 2013 - May 2017 Bachelor of Science in Applied and Computational Mathematics, Minor in Computer Science

• Data Structure and Algorithms, Object Oriented Programming, Numerical Methods, Simulation, Probability and Statistics, Linear and Integer Programming, Experimental Design, Survival Analysis SKILLS

• Programming: Python, R, SQL, MongoDB, Java, MATLAB, HTML5, CSS and JavaScript

• Packages: Numpy, Pandas, scikit-learn, SciPy, NLTK, TensorFlow, Keras

• Tools: Spark, AWS (S3, EC2, EMR), Latex, Git

• Data Visualization: Tableau, ggplot2, Plotly, Bokeh, Seaborn, Matplotlib PROFESSIONAL EXPERIENCE

SnapLogic October 2017 - Present

Data Science Intern San Mateo

“Speech to Pipeline” Search and Recommendation System

• Designed and implemented an end-to-end search engine for company core products with NLP and Deep Learning (Speech-to- text, Bag-of-words, Word2Vec, TF-IDF, XGBoost, LSTM); researched and developed an end-to-end recommendation engine.

• Built a web application with JavaScript, HTML5, and Python Flask library; presented to the whole company and adopted by Sales team for product demo use.

• Integrating production-quality code to company platform. Analytics and Visualization with a Cross-functional Team

• Architected a data pipeline which extracts data from MongoDB and Sumo Logic, cleans data, performs feature engineering, and stores data in both Postgres and AWS S3 in a monthly basis for future analysis.

• Created features from log records to cluster users (K-means), visualized the user distribution using Python, ggplot2, and Google Data Studio to help sales team target customers.

• Cleaned data, designed features and analyzed the most useful product elements which should be invested by R&D using Python, visualized in ggplot2, and presented to the product manager and analytics team. Census and Statistics Department (C&SD) of Hong Kong June 2016 - July 2016 IT Intern Hong Kong, China

• Designed and coded a prototype of a C&SD responsive website with HTML5, CSS, JavaScript, and the Bootstrap framework. PUBLICATION

Forecasting Smart Meter Energy Usage using Distributed Systems and Machine Learning April 2108 Co-first Author. The 16th IEEE International Conference on Smart City. Link

• Forecasted bi-hourly London smart meter usage (RMSE: 0.1659) with scalable Random Forest models.

• Established a data pipeline that automated the process of storing data on AWS S3, loading data into MongoDB on AWS EMR, processing data and feature engineering with Pandas and SparkSQL, and modeling with SparkML.

• Compared time and money cost through hyperparameters tuning of Random Forest and YARN configurations on AWS EMR. PROJECTS

Transformation from Images to Lego Bricks Link

• Built a web application which transforms images to Lego bricks, optimizes the purchase and assemble plan to minimize the price by web scraping real time date from third-party sellers, and generates assemble instruction for users. US Crime Visualization Link

• Visualized Crime rate in each US state from 2005 to 2016 with animation choropleth and line plots in plotly. Prediction of Canadian bankruptcy rates

• Forecasted Canada's monthly bankruptcy rate (1st group in class) using historical data of 22 years by applying different Time Series approaches such as ARIMA, SARIMAX, VAR, Holt-Winters and Exponential Smoothing, and ensembling top 2 models. Twitter Sentiment Analysis

• Programmatically fetched tweets using Twitter API and conducted sentiment analysis on tweets using Naive Bayes.



Contact this candidate