Data Scientist

Location:

San Francisco, CA

Posted:

May 28, 2017

Contact this candidate

Resume:

ZEFENG ZHANG

*********@****.*****.*** 415-***-**** San Francisco, CA 94158

www.linkedin.com/in/zefeng-zhang https://github.com/zefeng-zhang EXPERIENCE

Data Science Intern, Valor Water Analytics San Francisco, Mar 2017 - Present

Developed an inhomogeneous Markov Chain model to detect anomalies in time series data

Created dynamic buckets to aggregate 14M positively skewed data in Redshift

Wrote production-grade code and performed unit testing with Python Data Science Intern, LendUp San Francisco, Oct 2016 - Mar 2017

Developed an anomaly detection system for risk score distributions using statistical distances

Designed a permutation test to assess the e ect of sample size on Hellinger distance

Wrote production-grade code with Python and built data pipelines to Redshift using Psycopg2 Research Assistant, National University of Singapore Singapore, Aug 2012 - June 2016

Conducted Weibull analysis of fracture toughness datasets with C++

Assessed minimum sample size based on stochastic simulations of fracture toughness data

Developed nite element models and carried out numerical analysis on Linux EDUCATION

MS in Analytics, University of San Francisco Expected June 2017 Ph.D. in Engineering, National University of Singapore June 2016 B.Eng., Huazhong University of Science and Technology July 2012 Machine Learning Time Series Analysis Data Visualization & Tableau Distributed Computing Design of Experiments Data Acquisition SKILLS

Programming Python (Sklearn, Tensor

ow), R, PySpark, C/C++, MATLAB, Shell Databases PostgreSQL, NoSQL (Redshift, Cassandra, MongoDB) Visualization ggplot2, R Shiny, Plotly, D3, Tableau Tools AWS (S3, EC2, EMR), Git, Microsoft Excel

PROJECTS

Mobile Apps Installation Prediction for Vungle Mar 2017

Built a logistic regression model to predict CTR of mobile advertisements with Python on AWS

Applied hashing technique to encode high-cardinality features for 20M observations

Performed feature engineering and feature selection by categorical proportion variation NYC Taxi Trip Analysis Jan 2017

Performed K-means clustering on pickup and drop-o locations with PySpark on AWS EMR

Visualized clustering results and created a district heatmap using Python packages bokeh and ona Prediction of Fantasy Football Points Nov 2016

Predicted game results with Python (Ridge Regression, Random Forest, and Gradient Boosting). The RMSEs are on the same order of magnitude as FantasyData.com predictions. PUBLICATIONS

Zhang Z, Qian X. E ect of experimental sample size on local Weibull assessment of cleavage fracture. Fatigue Fract Eng M. 2016.

Contact this candidate