Post Job Free

Resume

Sign in

Data Analyst Clinical Research

Location:
Atlanta, GA
Posted:
February 03, 2019

Contact this candidate

Resume:

Annie Yang

*** ********* ****** **, *******, GA *0339 ac8drb@r.postjobfree.com 404-***-****

SKILLS

Programming: R (dplyr, glmnet, ggplot2), Python (scikit-learn, pandas, matplotlib), SAS (certified), SQL Statistics: Probability, Statistical Inference, Hypothesis Testing, Bayesian Inference, ANOVA Machine Learning: Supervised (linear regression, recommendation, bagging, random forest, boosting), Unsupervised (clustering, PCA), NLP Languages: English (fluent), Chinese (native), Japanese (intermediate) EDUCATION

Rollins School of Public Health, Emory University Atlanta, GA Master of Science (MS), Biostatistics May 2019

• GPA: 3.84/4.0 Relevant Coursework: Regression, R Programming, Machine Learning with Python, Survival Analysis, Statistical Inference Emory University, College of Arts and Sciences Atlanta, GA Bachelor of Arts (BA), Biology major, Mathematics minor May 2017

• GPA: 3.80/4.0 Dean’s List (2014 & 2016) Relevant Coursework: Linear Algebra, Bioinformatics, Java Programming, Cancer Genetics WORK EXPERIENCES

MediSix Therapeutics Singapore, Singapore

Data Analyst Intern May 2018 – September 2018

• Visualized and applied unsupervised machine learning algorithms on 2.5GB patient data to frame research plans for CAR-T drug development

• Established correlation analysis and discussed results with the Chief Scientific Officer to generate target feature sets and outcome metrics

• Summarized data patterns among leukemia samples by coding PCA and clustering algorithms via R and ggplot2 heatmap visualization

• Proposed target users for the new medicine based on derived conclusions, and assisted MediSix with drafting hypothesis for laboratory tests Winship Cancer Institute Atlanta, GA

Clinical Research Assistant September 2018 – Present

• Produced descriptive statistics via SAS Macro, and examined variations between treatment and control groups via t tests and Chi-square tests

• Developed logistic regression models to investigate the univariate and adjusted effects of each variable on binary treatment outcome

• Compared progression free survival and overall survival between treatment and control groups via log-rank test and Kaplan-Meier curves

• Built Cox proportional-hazards models to assess the adjusted effects of treatment on progression free survival and overall survival

• Evaluated and chose the best model through regression diagnostics, AIC, and forward model selection Rollins School of Public Health Atlanta, GA

Data Analyst August 2014 – May 2016

• Transformed and analyzed survey results in SQL and R to explore effects of physical activity and urbanization on children’s health

• Co-authored on papers ‘Healthfulness, Modernity, and Availability of Food and Beverages: Adolescents’ Perceptions in Southern India’, and

‘The Influence of Pediatric Oncology Summer Camp Attendance on Physical Activity, Fatigue, and Oxidative Stress’

• Discovered urbanization is associated with the development of secondary lifestyle among young adults through t test (p-value < 0.01)

• Concluded physical activity has stronger influence on fatigue for higher BMI category via Chi-square and Fisher’s exact test (p-values < 0.001) PROJECTS

Lending Club Risk Analysis Atlanta, GA

R Programming March 2018 – April 2018

• Fitted linear and logistic regression models on 3GB loan data to simulate credit risk via interest rate (continuous) and loan status (categorical)

• Performed data preprocess, missing imputation, and feature engineering for multi-type data including numerical, categorical, and timer serial

• Assessed model assumptions via residual diagnosis, reduced multicollinearity through regularization, and improved AUC from 0.64 to 0.81 Yelp Restaurant Recommender Singapore, Singapore

Python (Jupyter) June 2018 – July 2018

• Transformed 6GB unstructured review data to feature vectors by applying NLP methods, such as TF-IDF vectorization

• Implemented K-Means clustering algorithms on reviews, and investigated cluster centroids to understand user preferences

• Determined top attributes of positive and negative reviews via Logistic Regression and Random Forest; reduced overfitting via PCA

• Constructed collaborative filtering recommendation system based on predictive models to customize restaurant suggestions Breast Cancer Prediction Chicago, IL

Python (Jupyter) December 2017 – January 2018

• Utilized dimensionality reduction and classification to predict disease status, and to identify top features characterizing breast cancer

• Leveraged PCA to address multicollinearity assessed through correlation matrix, and to associate sample subgroups with clinical outcomes

• Established machine learning models with Logistic Regression, KNN, Random Forest, and Gradient Boosting to predict clinical outcomes

• Assessed model performances by calculating AUC, ROC curve, accuracy, precision, and recall; tuned each model by grid search with cross- validation; improved prediction accuracy from 86% to 97% on test data through feature selection, model comparison, and parameter tuning



Contact this candidate