Post Job Free

Resume

Sign in

Data Scientist

Location:
South Brunswick Township, NJ, 08852
Salary:
65000
Posted:
November 15, 2017

Contact this candidate

Resume:

Yigong (Leo) Liu

** ***** **, ******, ** ***** 215-***-**** ac3b8r@r.postjobfree.com

Objective

Statistic Master Student actively looking for full time opportunities in the data science/ machine learning fields.

Summary

Self-motivated, flexible, and amicable Statistics Master Student with strong education and practice background in Statistics, Data Science, Biomedical Science and Clinical Medicine.

Familiar with machine learning techniques: Linear Regression, Logistic Regression, Lasso, Ridge, Elastic net, Decision Tree, Random Forest, Boosting, LogitBoost, MART, K-NN, SVM, Naïve Bayes, Cross-Validation, etc.

Well versed in tools such as R, Python Pandas, SQL, Spark, Hadoop MapReduce, SAS and MATLAB.

Education

Master of Science, Statistics and Biostatistics Dec 2017

Rutgers University – New Brunswick, NJ

Master of Science, Biomedical Engineering May 2012

Drexel University, Philadelphia, PA

Bachelor of Medicine, Clinical Medicine Jun 2009

Shanghai Jiaotong University – School of Medicine, Shanghai, China

Programming Tools

R: caret, glmnet, ggplot2, dplyr, tidyr, reshape2, lubridate, data.table, stringr, etc.

Python: pandas, numpy, scipy, re, sqlite3, etc.

Spark: textFile, parallelize, map, flatmap, groupByKey, reduceByKey, filter, foreach, collect, etc.

Hadoop: Spark, MapReduce, HDFS, Yarn, SQOOP, Impala, Hive, etc.

SQL: create, select, from, where, group by, having, order by, join, if, case, etc.

SAS: macro, data, proc gplot, proc report, proc freq, logistic, glm, etc.

Related Working Experience

Research Assistant Jun 2017 – Present

Hunter College CUNY, Computer Science, New York, NY

Study and manage FMRI image data with size of 250g, and high dimensional DNA datasets. Create and evaluate multiple regression models, tree models using machine learning techniques.

Related Projects

Build classifiers for high dimensional DNA data to predict Alzheimer’s Disease

This project was implemented using R. I analyzed patient DNA dataset with variable size over 20000. Multiple classifiers were built using Lasso, Elastic net logistic regression and Random Forest. For each of the models, cross-validation method was used to determine the model parameters. Finally, the models were evaluated by creating ROC curves for each model.

Design and implement logistic regression and MART algorithms for image classification

The algorithms were coded in Matlab. In this project, logistic regression and MART model were built and implemented to classify zip-code images. Each zip-code image was first reshaped into single-row fashion dataset. Then the algorithm of logistic regression and MART were designed and implemented.

Target potential high sale customers using cluster analysis

This project was implemented using SAS. Information of current customers with sales were analyzed. Factor analysis was first applied to reduce dimension. Then cluster analysis was performed and the cluster centers for the high sale cluster were calculated. Finally identified the potential high sale customers that close to the calculated cluster centers.

Publications

Wrote 5 published papers including:

Y. Liu, Q. Hamid, J. Snyder, C. Wang, and W. Sun, “Evaluating Fabrication Feasibility and Biomedical Application Potential of in situ 3D Printing Technology,” Rapid Prototyping.



Contact this candidate