ZEFENG ZHANG
ac0i5t@r.postjobfree.com 415-***-**** San Francisco, CA 94158
www.linkedin.com/in/zefeng-zhang https://github.com/zefeng-zhang EXPERIENCE
Data Science Intern, Valor Water Analytics San Francisco, Mar 2017 - Present
Developed an inhomogeneous Markov Chain model to detect anomalies in time series data
Created dynamic buckets to aggregate 14M positively skewed data in Redshift
Wrote production-grade code and performed unit testing with Python Data Science Intern, LendUp San Francisco, Oct 2016 - Mar 2017
Developed an anomaly detection system for risk score distributions using statistical distances
Designed a permutation test to assess the e ect of sample size on Hellinger distance
Wrote production-grade code with Python and built data pipelines to Redshift using Psycopg2 Research Assistant, National University of Singapore Singapore, Aug 2012 - June 2016
Conducted Weibull analysis of fracture toughness datasets with C++
Assessed minimum sample size based on stochastic simulations of fracture toughness data
Developed nite element models and carried out numerical analysis on Linux EDUCATION
MS in Analytics, University of San Francisco Expected June 2017 Ph.D. in Engineering, National University of Singapore June 2016 B.Eng., Huazhong University of Science and Technology July 2012 Machine Learning Time Series Analysis Data Visualization & Tableau Distributed Computing Design of Experiments Data Acquisition SKILLS
Programming Python (Sklearn, Tensor
ow), R, PySpark, C/C++, MATLAB, Shell Databases PostgreSQL, NoSQL (Redshift, Cassandra, MongoDB) Visualization ggplot2, R Shiny, Plotly, D3, Tableau Tools AWS (S3, EC2, EMR), Git, Microsoft Excel
PROJECTS
Mobile Apps Installation Prediction for Vungle Mar 2017
Built a logistic regression model to predict CTR of mobile advertisements with Python on AWS
Applied hashing technique to encode high-cardinality features for 20M observations
Performed feature engineering and feature selection by categorical proportion variation NYC Taxi Trip Analysis Jan 2017
Performed K-means clustering on pickup and drop-o locations with PySpark on AWS EMR
Visualized clustering results and created a district heatmap using Python packages bokeh and ona Prediction of Fantasy Football Points Nov 2016
Predicted game results with Python (Ridge Regression, Random Forest, and Gradient Boosting). The RMSEs are on the same order of magnitude as FantasyData.com predictions. PUBLICATIONS
Zhang Z, Qian X. E ect of experimental sample size on local Weibull assessment of cleavage fracture. Fatigue Fract Eng M. 2016.