Yigong (Leo) Liu
** ***** **, ******, ** ***** 215-***-**** ac3b8r@r.postjobfree.com
Objective
Statistic Master Student actively looking for full time opportunities in the data science/ machine learning fields.
Summary
Self-motivated, flexible, and amicable Statistics Master Student with strong education and practice background in Statistics, Data Science, Biomedical Science and Clinical Medicine.
Familiar with machine learning techniques: Linear Regression, Logistic Regression, Lasso, Ridge, Elastic net, Decision Tree, Random Forest, Boosting, LogitBoost, MART, K-NN, SVM, Naïve Bayes, Cross-Validation, etc.
Well versed in tools such as R, Python Pandas, SQL, Spark, Hadoop MapReduce, SAS and MATLAB.
Education
Master of Science, Statistics and Biostatistics Dec 2017
Rutgers University – New Brunswick, NJ
Master of Science, Biomedical Engineering May 2012
Drexel University, Philadelphia, PA
Bachelor of Medicine, Clinical Medicine Jun 2009
Shanghai Jiaotong University – School of Medicine, Shanghai, China
Programming Tools
R: caret, glmnet, ggplot2, dplyr, tidyr, reshape2, lubridate, data.table, stringr, etc.
Python: pandas, numpy, scipy, re, sqlite3, etc.
Spark: textFile, parallelize, map, flatmap, groupByKey, reduceByKey, filter, foreach, collect, etc.
Hadoop: Spark, MapReduce, HDFS, Yarn, SQOOP, Impala, Hive, etc.
SQL: create, select, from, where, group by, having, order by, join, if, case, etc.
SAS: macro, data, proc gplot, proc report, proc freq, logistic, glm, etc.
Related Working Experience
Research Assistant Jun 2017 – Present
Hunter College CUNY, Computer Science, New York, NY
Study and manage FMRI image data with size of 250g, and high dimensional DNA datasets. Create and evaluate multiple regression models, tree models using machine learning techniques.
Related Projects
Build classifiers for high dimensional DNA data to predict Alzheimer’s Disease
This project was implemented using R. I analyzed patient DNA dataset with variable size over 20000. Multiple classifiers were built using Lasso, Elastic net logistic regression and Random Forest. For each of the models, cross-validation method was used to determine the model parameters. Finally, the models were evaluated by creating ROC curves for each model.
Design and implement logistic regression and MART algorithms for image classification
The algorithms were coded in Matlab. In this project, logistic regression and MART model were built and implemented to classify zip-code images. Each zip-code image was first reshaped into single-row fashion dataset. Then the algorithm of logistic regression and MART were designed and implemented.
Target potential high sale customers using cluster analysis
This project was implemented using SAS. Information of current customers with sales were analyzed. Factor analysis was first applied to reduce dimension. Then cluster analysis was performed and the cluster centers for the high sale cluster were calculated. Finally identified the potential high sale customers that close to the calculated cluster centers.
Publications
Wrote 5 published papers including:
Y. Liu, Q. Hamid, J. Snyder, C. Wang, and W. Sun, “Evaluating Fabrication Feasibility and Biomedical Application Potential of in situ 3D Printing Technology,” Rapid Prototyping.