Fei Liu
Phone: +1-313-***-****
Mountain View, CA 94043
**********@*****.***
https://github.com/feiliu23
Experience
present
Data Scientist Intern, Isazi Consulting, South Africa, Remote.
{ Predicted tuberculosis from chest X-rays using deep learning convolutional neural network (CNN) built from scratch with 84% accuracy and transfer learning with 88% accuracy in Pytorch.
{ Detected 14 advanced lung diseases using pre-trained CNN model with over 100,000 chest X-rays released by NIH.
2/16 – 7/16 Data Analyst, American Institute for Research, San Mateo, California.
{ Conducted ad hoc data analysis in SAS and R to help educational researchers and policy makers deepen understanding of US PK-12 education status, including gender gap, education inequality.
{ Developed and executed data processing plans; automated data review, reporting, and documentation generating processes to assist Department of Education in their annual data products release. 11/10 – 2/16 Statistical Analyst, MacroSys LLC, Washington DC.
{ Measured key metrics to monitor United States PK-12 student academic performance, contextual behavior, school and teacher characteristics using NAEP complex multi-stage survey data.
{ Built analytical tool in R enabling researcher with less programming experience to analyze NAEP data more efficiently; including guiding junior staff.
{ Fitted multiple linear regression with fixed and random effect and factor analysis to predict difficulty of Grade 4 and 8 reading items with Coh-Metrics factors, improving NAEP reading assessment effectiveness.
{ Validated NAEP math and history assessment framework by implementing stratifed bootstrap re-sampling method, creating 10,000 hypothetical assessment samples to determine Mathematics framework content coverage variability; validated results with the most recent 5 assessments. 1/10 – 9/10 Research Data Analyst Intern, Academy for Educational Development, Washington DC.
{ Extracted education indicators from household surveys to gain insight of local school district education quality using STATA, SPSS, and SQL.
Projects
Yelp review sentiment analysis: Predicted customer sentiment with features extracted from Yelp’s 3.7 GB text reviews using Naive Bayes, Logistic Regression and Neural Network model using Pyspark on AWS EMR instance, resulting in 89% accuracy and 91% AUC. San Francisco parking prediction: Predicted San Francisco parking spot availability using Random Forest, XGboost model and clustering algorithm with historical parking sensor and meter geoinformation and time data. Reported feature importance and 0.56 F2 score. Photo scoring web application: Built a web application to predict photo popularity using CNN model trained on Flickr datasets.
Education
2017 – 2018 University of San Francisco, M.S. in Data Science, expected June 2018. 2008 – 2010 George Washington University, M.S. in Statistics. 2004 – 2008 China Agricultural University, B.S. in Mathematics. Technical Skills
Statistics: Linear Regression, Logistic Regression, Multivariate Data Analysis, PCA, Sampling Survey, Missing Imputation, Time Series, Experimental Design, and Bayesian Inference Machine Learning: Random Forest, Gradient Boosting, Support Vector Machine, Clustering, Recommendation System, Natural Language Processing, Neural Network, and Deep Learning Programming Languages: Python, R, SQL, Spark, SAS, Julia, and Tableau Certifications
SAS Certified Advanced Programmer