YaoYU
Contact SUMMARY
Solid statistical and machine learning knowledge in data analysis, eight years of hands-on ex-
Simi Valley CA, 93065
perience in data cleaning, modeling and mining, proficiency in programming in Java, Python,
R, C++ and SQL, excellent interpersonal and communication skills
*********@*****.***
EDUCATION
Linkedin: yaoyu1
Jul 2012 Ph.D. in Applied Statistics University of California, Riverside
Programming Thesis: Bayesian and Non-parametric Approaches to Missing Data Analysis
Expert GPA: 3.93/4.0
R, SQL, Java, SAS Jul 2007 Bachelor in Statistics University of Science and Technology of China
Intermediate
EXPERIENCE
Python, C++, Matlab
Beginner 2012–Present Amgen Inc. Thousand Oaks, California
Scala Biostatistics Manager
Highly self-motivated professional, team player, working on multiple critical tasks
Coursework across the drug development. Main responsibilities involve:
Machine Learning, • Providing sound, strategic, statistical input to optimize study design and
Algorithms and Data meeting scientific / business justifications
Structures, • Subject-matter expert in data imputation and Bayesian Statistical tech-
Data Science, niques, providing thought leadership and consulting expertise to other statis-
Theoretical Statistics ticians in the team
and Probability, • Gathering business insights, collaborating and conducting studies with ex-
Time Series, perts across multiple functions to meet aggressive timelines
Statistical Computing, • Presenting and articulating complex statistical concepts to various level au-
Advanced Design and diences from executive management to programmers
Analysis of • Actively involved in innovative exploratory statistical solutions:
Experiments, – Investigating potential drug effect on blood pressure fluctuation based
Multivariate Analysis, on patients’ measurements every 15 minutes for a long period with
Statistical Consulting adjustment of other effects/interactions
and Data Analysis, – Researching the correlation between weight and drug concentration by
Bayesian Statistics, performing a meta analysis on pooled data from multiple data sources
Nonparametric – Implementing Bayesian analysis to scale the historical data and man-
Methods aged to downsize the new study in the design step
– Developing a sequential design procedure to evaluate the equivalence
of two drugs with integrated composite scores
2009-2012 University of California, Riverside Riverside, California
Research Assistant
Focused on two angles missing data analysis: a non-parametric method without
any distribution assumption and the Bayesian methods
• Developed an extended Fisher discriminant (the test threshold is determined
by implementing boostrapping) to classify missing types.
• Improved the computational efficiency of MCMC algorithm for the predictive
models of multilayered missing data in a large survey data under different
scenarios
• Optimized the model selection by performing model checking and
goodness-of-fit test
2009 Amgen Inc. Thousand Oaks, California
Intern
Conducted literature review and implemented Monte Carlo simulation procedure in data imputation
and analysis
• Performed missing pattern recognition and developed the visual tools to illustrate the impact
• Compared the performance of single value imputation, mixed effect model, weighted estimating
equation, and Bayesian approaches in a simulation study under different assumptions
2008–2009 University of California, Riverside Riverside, California
Student Consultant
• Identified covariates which contribute to the differences in cervicovaginal cytokine concentrations
between pregnant and non-pregnant women using the robust principle component analysis(Data
from Department of Plant Pathology & Microbiology, UC Riverside)
• Cleaned the data, evaluated the performance of Naive Bayes, CART and random forest on clas-
sifying estimated value ranges of pre-owned cars (Data from KBB)
2007–2012 University of California, Riverside Riverside, California
Teaching Assistant
Assisted teaching with the duties including leading discussions, clarifying related concepts, guiding
statistical analysis with the software(minitab, excel and SAS) in graduate level courses.
LEADERSHIP
2013 Medical Science Biostatistics department at Amgen Inc. Thousand Oaks California
Team Lead
Lead the team to improve operational efficiency on knowledge sharing
• Organized and facilitated the weekly meeting, allocated the workload
• Performed division of labor, specified the aspects which need improvement, brain-stormed the
possible solutions, analyzed the impact and feasibility
• Presented the proposal to the executive management and initiated the process to optimize work
efficiency
PUBLICATIONS
Jun Li, Yao Yu, A Nonparametric Test of Missing Completely at Random for Incomplete Multivariate Data, Psychome-
trika, 2014. doi: 10.1007/s11336-014-9410-4