Zixiao (Melody) Zhang
**** * ******** *****, *** 316, Arlington, VA 22201 ■ 202-***-**** ■ adhrau@r.postjobfree.com SAS Certified Specialist seeking position in data analyst, statistician, biostatistician and related area EDUCATION
M.S. in Biostatistics December 2020
Georgetown University, Washington, D.C.
Coursework: Clinical Trial, Machine Learning, Data Science, Categorical Data Analysis, Epidemiology, Survival Analysis B.S. in Statistics July 2019
Hong Kong Baptist University, Zhuhai, China
Coursework: Time Series Analysis, Data Mining, Simulation, Multivariate Analysis WORK EXPERIENCE
Research Assistant
Biostatistics Department, Georgetown University, Washington, D.C. January 2020 – May 2020
• Assisted 6 projects which selected the interested covariates and merged health status data collected from Georgetown University Hospital
• Checked normality of continuous variables using Shapiro-Wilk test and summarized statistics and univariate test using R package “tableone”
• Performed Kruskal-Wallis test to check significant association with analyzed p-value between continuous variables and categorical variables with non-normality variables, Wilcoxon rank-sum test for binary group and ANOVA for normality variables using R
• Generated 6 overall reports and interpreted the results by inserting comments in Excel RELATED EXPERIENCE
Research on Breast Cancer in Microarray Studies (R) September 2020
• Transformed raw data to pre-processed data using package “Bioconductor” in R
• Developed the self-define R function to implement the Quantile Normalization algorithm
• Conducted the moderated test and selected genes at the cutoff of BH adjusted p-value using package “Limma” in R
• Conducted the reduction dimension of variables by Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Multidimensional Scaling (MDS)
Project – Relationship between Diabetes and Vitamin C (SAS) August 2020
• Cleaned data by deleting the missing value directly and identified diabetes based on data from NHANES between 2005 and 2006 using PROC SQL
• Summarized data that describe the mean, median, minimum value and maximum value with baseline table using PROC CONTENTS, PROC MEANS and PROC FREQ
• Conducted normality test, t-test, logistic regression with analyzing p-value using PROC TTEST, PROC LOGISTIC
• Generated reports using procedures like PROC SORT, PROC REQ, PROC UNIVARIATE, concluded that vitamin C concentration in serum was significantly lower in diabetics than non-diabetics in the US population, and more significant in non-Hispanic group
Machine Learning Research Based on Mental Health Data (Python) June 2020
• Performed data cleaning by dealing with missing data and developed data processing algorithms that transform non- standard data
• Split data into a training set and a testing set in package “Scikit-learn”
• Explored data using correlation matrix and conducted charts to visualize data via Seaborn
• Developed model using machine learning method including Logistic Regression, KNN, Decision Tree to predict treatment in package “Scikit-learn”; tuned parameters in Python with 82% accuracy by comparing the confusion matrix and ROC/AUC score
CERTIFICATIONS AND TECHNICAL SKILLS
Certifications
• SAS Certified Specialist: Advanced Programming Using SAS 9.4
• SAS Certified Specialist: Base Programming Using SAS 9.4 Technical skills
• R, SAS, Python, SPSS, Excel, RDBMS (MySQL), Tableau, AWS, Linux, LaTex