Post Job Free

Resume

Sign in

Data Analyst

Location:
Huntington Station, NY
Posted:
May 17, 2017

Contact this candidate

Resume:

*

Yunxiang Gan

Phone: (1-848-***-**** ac0c1w@r.postjobfree.com

EDUCATION

Oct. 2017 Rutgers University, School of Arts and Science New Brunswick, NJ M. S. Statistics

June 2015 Hunan Normal University Hunan, China

B.S. Biological Science

CORE QUALIFICATIONS

Certification: SAS Certified Base Programmer for SAS 9;

Extensive knowledge of SAS base, SAS SQL, SAS macro, R, SQL, and Python;

Familiar with CDISC/ SDTM ADaM;

Level-handed and calmly able to work efficiently in hight-stress situation and with imminent deadlines;

Relevant Courses: Regression Analysis for Applied Biostatistics, Categorical Analysis, Bio-computing(SAS), Design of Experiment Interpretation of Data, Multivariate Analysis, Data Mining, Machine Learning. PROJECTS

The Relationship Between BMI and Alcohol Consumption. 2015 Rutgers University

Merged all data by proc merge, then re-coded all covariates, created dummy variables by using proc format, meanwhile, utilized if and do -loop functions to delete all the observations with missing value;

Examined the relationship between each variable and BMI by using proc corr, proc ttest, and also conducted descriptive analysis by using proc means for continuous variables such as age, proc freq for categorical variables such as smoking status, alcohol consumption.

Utilized stepwise method to construct the optimal model by, and checked the normality assumption, proc glm was used.

Tumor prediction by Gene Expression 2016 Rutgers University

Built an R package to implement the methodology of enriched ensembles of variable selection by, established a new function, conducted Cross Validation and calculated a vector of weights W that reflects variables importance;

Sampled K variables and selected a subset with the best performance, choose the model LDA to predict in the testing set and obtained the prediction vectors Pi for performance evaluation;S

Gained package’s performance via accuracy rate through both training and testing set which are both 0.9, tested my LDA performance by other algorithms such as ADMM, CART, Random Forest, etc, which all have accuracy rate over 0.85.

R Package can operate Text mining and Machine Learning 2017 Rutgers University

Used tm_map to remove numbers, capitalization, stopwords and white spaces, then generated corresponding DocumentTermMatrix and used removeSpareseTerms to deal with sparsity of the whole data set, then checked and plotted term frequency in R by worldcloud package;

Built an R package to implement Machine Learning methods on the DocumentTermMatrix to analysis its pattern, including Random Forest, SVM, Ada-boosting, GLMnet, CART, ANN, PCA-LDA, Penalized methods, Weighted methods;

Outputted all the accuracy rate via training data set and testing data set of different algorithms into one table, automatically got the highest accuracy algorithm.

INTERNSHIP

01/2012-12/2014 Xiangbei Welman Pharmaceutical Co., Ltd Hunan, China Data Analyst

Pre-processed clinical data which was obtained from phase II-IV clinical data for development of Penicillin powder data from data management department, such as deleting missing values and spaces, formatting categorical variables, optimizing raw data into the form that is more applicable to SAS software;

Tried few models that is commonly used for developing Penicillin powder, then optimized the models for auditing and analyzing by other clinical data to ensure accuracy for forecasting;

Assisted senior analysts in validation reports, reviews and saving assessments.



Contact this candidate