Post Job Free
Sign in

Data Scientist / Statistician

Location:
Cary, NC, 27519
Posted:
March 30, 2020

Contact this candidate

Resume:

Wei Xue

919-***-**** adcjq7@r.postjobfree.com USA Citizen Doctor of Public Health in Biostatistics

**** ******* *** **, ****, NC27519

Key words

Doctor of Public Health in Biostatistics

Statistical Analysis, Machine Learning, Deep Learning

Summary of Qualifications

7 years working experience including 3.5 years research assistant on clinical trial data in UNC and more than 5 years working experience in IQVIA, building up data set, conducting statistical analysis on clinical data and survey research data, constructing algorithm and visualizing risk-based monitoring of clinical data on both site and subject level, checking potential fraud and misconduct in clinical trials, working on safety of the drugs;

Strong background in Biostatistics with DrPH and MPH degree from UNC-CH, proficient in 1). Machine learning, including supervise learning and unsupervised learning (random forest, decision tree, regularization, gradient boosting trees, categorical boosting, SVM, KNN, isolation forest, and etc). 2). Deep learning, including fully connected deep neural network, convolutional neural network, recurrent neural network (GRU, LSTM), GANs, regularization in deep learning 3). Statistical analysis including categorical data analysis, longitudinal data analysis, survival data analysis, different regression models (linear regression, logistic regression, ), generalized linear model, non-parametric statistical analysis, GWA studies, statistical learning (including random forest, decision tree, clustering, PCA and etc.), multivariate data analysis, outlier detection, variable selection / importance, normalization; risk-based monitoring (RBM) methodology in both site and subject level;

IQVIA Inventor Recognition Program Award in Dec. 2017 as a named inventor of a key IQVIA innovation. A patent with risk-based monitoring analysis was filed successfully with the US Patent Office;

Knowledge of clinical research regulatory requirements such as SDTM, GCP and ICH;

Strong programming skills, proficient in Python, SAS, R, Keras, nQuery, Spotfire, Jupyter notebook; familiar with unix, SQL, PySpark, knowledge of Rshiny;

Excellent written, oral communication and presentation skills as well as the grammatical/technical writing skills; topics were selected in 2019 DIA global conference and gave presentation in conference

Excellent organizing skills, individual initiative and problem-solving skills;

Excellent sense of accuracy, good quality and multi-tasks skills especially in time management;

Outstanding team members, willing to accept directions from lead and maintain good working relationship with colleagues, managers and clients.

I am curious, passionate, authentic, and accountable. Strong learning skills.

Education

Doctor of Public Health, Biostatistics, UNC, Chapel Hill, NC 8/2011 – 8/2016

Master of Public Health, Biostatistics, UNC, Chapel Hill, NC 8/2011 – 8/2013

Master of Science, Biochemistry and Molecular Biology, Sichuan University 9/2005 – 6/2008

Awards

IQVIA Inventor recognition Program Award (with patent under review) 12/2017

IQVIA 2018 CEO Team Awards (RBM generated $1.6 million in sales in 2017) 5/2018

Skills and Courses

Statistical programming with Python (numpy, sklearn, scipy, pandas), SAS, SUDAAN and R; Knowledge of PySpark; SQL, Unix; Jupyter notebook, spider; Keras (with Tensorflow backend); h2o

Visualization with Spotfire, ggplot (R), matplotlib and Seaborn(Python)

nQuery, LaTex and MS office.

Main courses taken in Dept. of Biostatistics UNC-CH, including Probability and Statistical Inference I & II, Statistical Computing and Data Management (SAS), Intermediate Statistical methods, Intermediate Linear Models, Sample Survey Methodology, Categorical Data Analysis, Design of Public Health, Design and Analysis of Clinical Trials, Theory of Linear Models, Longitudinal Data Analysis, Advanced Statistical Methods in Public Health and Biometry, Generalized Linear Model Theory and Applications, Bayesian statistics, Leadership in Biostatistics, Deep Learning.

Projects in deep learning:

1.Use fully-connected neural network to fit data with ReLU

2.Fit feedforward neural network with learning rate decay

3.Compare feedforward neural network with convolutional neural network in different datasets such as Cifar-10 data

4.Use LSTM in deciding whether a sequence of letters comes from an embedded Reber grammar or not

5.Use autoencoder on the Cifar-10 data and generate decoder picture

Working Experience

Data Trial Execution, IQVIA, Morrisville, NC

Centralized Monitoring Manager (06/2018 - Now)

Senior IPT Specialist (11/2015 – 6/2018)

Biostatistician Intern (8/2014 - 11/2015)

8/2014 – Now

Use machine learning method in the prediction of site cycle time, including variable selection, hyperparameter tuning and ensemble methods (based on different machine learning models)

Participate in development, QC and enhancement in the Data-driven Trial execution (DTE) project, using statistical model-based risk based monitoring analysis on the Key Risk Indicators (KRI) of on-going clinical trial data to find out the riskiest sites (outliers) in a protocol. Help with building visualization through Spotfire.

Work with other groups using IQVIA data to estimate key event rates in cardiovascular trials.

Take part in site historical ranks analytics and visualization.

Lead subject level data analysis in clinical trial, using advance analytics and machine learning models. Work with Spotfire developers with visualization

Lead potential fraud and misconduct analysis in clinical trial in both site level and subject level.

Participate in statistical predictive analytics in both site and subject level in clinical data.

Lead statistical tiering project using statistical method with RBM method to help clinical lead with tiering the sites

Involved in Site Segmentation project on methodology and visualization

Construct analysis data set and conduct the multivariate statistical analysis on clinical data.

Center for Statistics for Drug Development (CSDD), Innovation Unit, QuintilesIMS, Morrisville, NC

Summer Biostatistician Intern

5/2014 – 8/2014

Perform drug safety analysis based on Phase I and Phase II studies’ adverse event data

Calculate sample size using nQuery.

Develop analysis plans, table shells, programming and table specifications; produce tables, listings and figures; perform data review and statistical analysis.

Assist with protocol development, sample size calculation, and protocol, case report form (CRF) review, and writing statistical sections of integrated reports.

Carolina Survey Research Laboratory (CSRL), University of North Carolina at Chapel Hill (UNC), Chapel Hill, NC

Research Assistant

8/2013 – 11/2015

Work on and participate in multiple projects by collaborating with faculties in CSRL and clients. The projects include PUFF, Genetic Ancestry, TCORS, TCORS Youth, Colorado Parenting, CT_Quitline study and LA homeless study.

Use SAS to create analysis data sets based on the survey data; set up strata; calculate weight and response rate; create tables and graphs, conduct statistical analysis based on the requests from PIs and collaborators, and create code book from data sets. Participate in writing papers.

Present data analysis report to collaborators and PIs through weekly conference. Create the Incentive and PI reports from the calling survey data weekly.

Department of Biostatistics, School of Public Health, UNC

Teaching Assistant

1/2014 – 5/2014

Grade homework and help with answering questions for course Sampling survey methodology (BIOS664) (Spring 2014).

Collaborative Studies Coordinating Center (CSCC), University of North Carolina at Chapel Hill (UNC), Chapel Hill, NC

Research Assistant

5/2012 – 8/2013

Collaborate with statisticians and programmers in the group with building up analysis data sets for Hispanic Community Health Study (HCHS).

Create tables and charts after analyzing the epidemiology of heart rate variability for HCHS cohort, and the relationship between BMI and HCHS cohort.

Create HCHS cohort community data book with charts using SAS and SUDAAN.

Doctoral Thesis

Supervisor: Dr. Eric Bair.

Permutation-based genetic association test for secondary outcome

Propose a permutation-based Inverse probability weighting (IPW) method for association of secondary phenotypes and genetic traits in Genome-Wide Association Studies (GWAS) under case-control study where the outcome is highly associated with case-control status.

Apply the method to identify SNPs associated with the severity of orofacial pain using data from the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study.

Conditional group variable importance test in random forest

Develop methodology of group conditional variable importance by random forest to answer the question if a subset of variables can bring more information to the prediction of outcome conditional on the existing variables. Perform variable importance and significant test in random forest.

Verify the methods by simulation with different kinds of data.

Conditional group variable importance and statistical test in OPPERA study

Apply the methodology of group conditional variable importance to the data in OPPERA study (including chronic TMD data in and fist-onset TMD data) to identify whether some predictors are still important conditional on the existing predictors.

Master’s Thesis

Supervisor: Dr. Sonia Davis and Dr. David Couper.

The association of postprandial lipemia and cardiovascular outcomes in the Atherosclerosis Risk In Communities (ARIC) Study

The survival analysis for the association between postprandial lipemia (PPL) and coronary heart disease and the association between PPL and stroke in Atherosclerosis Risk In Communities (ARIC) Study.

Use Cox proportional hazards model to analyze the predictors (TG, TRL-TG, RP, or apoBR) and covariates (such as age, gender, smoking status and etc) for the time to event of cardiovascular disease (including CHD, CVD and stroke).

Publications

Stein A, Suttie J, Baker L, Agans R, Xue W, Bowling JM. Predictors of Smoke-Free Policies in Affordable Multiunit Housing, North Carolina, 2013. Prev Chronic Dis, 12:E73, 2015

Stein AH, Baker LE, Agans RP, Xue W, Collins NM, Suttie JL. The Experience with Smoke-Free Policies in Affordable Multiunit Housing in North Carolina: A Statewide Survey. Am J Heal Promot, 30(5):382-9, 2016

Genetic association analysis on secondary phenotypes and group conditional variable importance in OPPERA study (Doctoral thesis)



Contact this candidate