X iuwen Z heng
**** **** *** **, *** #**, Seattle, WA 98105, (206) 661-
****, abqne3@r.postjobfree.com
CORE QUALIFICATIONS
o Five years of professional research experience in statistics, statistical genetics, machine
learning, bioinformatics, and high-
performance computing
o Applications include quality control in genotypic data for genome-
wide association studies
(GWAS), and high-
dimensional data analysis on DNA microarrays
o Programming Languages: Proficient in C/C++, Delphi (Object Pascal), working knowledge of
JAVA, PERL and PYTHON
o Software: R, SAS, MATLAB, SQL and MPI
EDUCATION
PhD, Biostatistics, Expected 2/13
Dept. of Biostatistics, University of Washington (UW), Seattle, WA
Dissertation: Statistical Prediction of HLA Alleles and Relatedness Analysis in Genome-
Wide Association Studies
MS, Statistics, 5/07
Dept. of Mathematical Sciences, University of Texas at Dallas (UTD), TX
BA, Finance and Statistics, 7/05
Dept. of Statistics and Finance, University of Science and Technology of China (USTC), Hefei
REPRESENTATIVE PROJECTS
Human Leukocyte Antigen (HLA) prediction project
Collaborated with GlaxoSmithKline (GSK) for a study of statistical prediction of HLA alleles
Applied and developed machine learning algorithms (random forest and attribute bagging), and
prepared manuscripts
Gene Environment Association Studies project (GENEVA)
GENEVA is a NIH-
funded consortium of 16 genome-
wide association studies from 12 universities
and research institutes, which aims to accelerate understanding of genetic and environmental
contributions to health and disease with thousands of samples and millions of SNPs
Performed data cleaning and analysis on large-
scale genotypic data, and involved in preparation
of manuscripts
CoreArray high-
performance computing project
Developed parallel computing algorithms using C/C++ for relatedness and principal component
analysis in GWAS, and prepared manuscripts
My algorithms achieve up to a 300-
fold speedup over the original serial implementations
The electronic Medical Records and Genomics (eMERGE) network project
The aim is to identify genetic variants associated white blood cell count differential leukocyte
types in 13,923 subjects in the eMERGE network
Performed data analysis and involved in preparation of manuscripts
SNP microarray project
Mosaics for large chromosomal anomalies were detected using SNP microarray data from over
50,000 subjects of GENEVA
Performed data analysis and involved in preparation of manuscripts
2/4
EXPERIENCE
Research Assistant, Genetics Coordinating Center, Dept. of Biostatistics, UW 9/07 now
Performed independent and collaborative research on major projects: GENEVA, HLA, CoreArray,
eMERGE and SNP microarray
Research Assistant, Bioinformatics Lab, Dept. of Computer Science, UTD 2/06 12/06
Participated in Microarray Quality Control (MAQC) Project
Conducted microarray data cleaning and analysis, involved in preparation of manuscripts
Teaching Assistant, Dept. of Biostatistics, UW 10/11 12/11
Teaching Assistant, Dept. of Mathematical Sciences, UTD 8/05 5/07
Courses: Applied Calculus, Probability and Statistics for Management and Economics, Medical
Biometry
grading and leading discussions
Statistical Consulting, Dept. of Biostatistics and Statistics, UW 9/10 12/10
Primary projects:
Power simulation in logistic regressions with confounders for oral cancer
Correlated data analysis on the effects of the King County menu labeling regulation
PUBLICATIONS
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A High-
performance Computing
Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. 2012
Oct 11. [Epub ahead of print]
Gogarten SM, Bhangale T, Conomos MP, Laurie CA, McHugh CP, Painter I, Zheng X, Crosslin
DR, Levine D, Lumley T, Nelson SC, Rice K, Shen J, Swarnkar R, Weir BS, Laurie CC.
GWASTools: an R/Bioconductor package for quality control and analysis of Genome-
Wide
Association Studies. Bioinformatics. 2012 Oct 10. [Epub ahead of print]
Laurie CC, Laurie CA, Rice K, Doheny KF, Zelnick LR, McHugh CP, Ling H, Hetrick KN, Pugh
EW, Amos C, Wei Q, Wang LE, Lee JE, Barnes KC, Hansel NN, Mathias R, Daley D, Beaty TH,
Scott AF, Ruczinski I, Scharpf RB, Bierut LJ, Hartz SM, Landi MT, Freedman ND, Goldin LR,
Ginsburg D, Li J, Desch KC, Strom SS, Blot WJ, Signorello LB, Ingles SA, Chanock SJ, Berndt SI,
Le Marchand L, Henderson BE, Monroe KR, Heit JA, de Andrade M, Armasu SM, Regnier C,
Lowe WL, Hayes MG, Marazita ML, Feingold E, Murray JC, Melbye M, Feenstra B, Kang JH,
Wiggs JL, Jarvik GP, McDavid AN, Seshan VE, Mirel DB, Crenshaw A, Sharopova N, Wise A,
Shen J, Crosslin DR, Levine DM, Zheng X, Udren JI, Bennett S, Nelson SC, Gogarten SM,
Conomos MP, Heagerty P, Manolio T, Pasquale LR, Haiman CA, Caporaso N, Weir BS.
Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat Genet.
2012 May 6;44(6):642-
50. doi: 10.1038/ng.2271.
Crosslin DR, McDavid A, Weston N, Nelson SC, Zheng X, Hart E, de Andrade M, Kullo IJ,
McCarty CA, Doheny KF, Pugh E, Kho A, Hayes MG, Pretel S, Saip A, Ritchie MD, Crawford
DC, Crane PK, Newton K, Li R, Mirel DB, Crenshaw A, Larson EB, Carlson CS, Jarvik GP;
Electronic Medical Records and Genomics (eMERGE) Network. Genetic variants associated
with the white blood cell count in 13,923 subjects in the eMERGE Network. Hum Genet.
2012 Apr;131(4):639-
52. doi: 10.1007/s00439-
011-
1103-
9. Epub 2011 Oct 30.
Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ, Bhangale T, Boehm F, Caporaso NE,
Cornelis MC, Edenberg HJ, Gabriel SB, Harris EL, Hu FB, Jacobs KB, Kraft P, Landi MT, Lumley
T, Manolio TA, McHugh C, Painter I, Paschall J, Rice JP, Rice KM, Zheng X, Weir BS; GENEVA
3/4
Investigators. Quality control and quality assurance in genotypic data for genome-
wide
association studies. Genet Epidemiol. 2010 Sep; 34(6):591-
602.
Wenyuan Li, Xiuwen Zheng, and Ying Liu. Gene Selection by Matrix Reordering and
Replicator Dynamics. 7th International Workshop on Data Mining in Bioinformatics (BIOKDD
'07) in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San
Jose, CA, USA, August 2007.
Zheng X, Huang HC, Li W, Liu P, Li QZ, Liu Y. Modeling nonlinearity in dilution design
microarray data. Bioinformatics. 2007 Jun 1;23(11):1339-
47. Epub 2007 Jan 19.
In progress:
David R. Crosslin, Andrew McDavid, Noah Weston, Sarah Nelson, Xiuwen Zheng, Eugene
Hart, Mariza de Andrade, Iftikhar J. Kullo, Catherine A. McCarty, Kimberly F. Doheny,
Elizabeth Pugh, Abel Kho, M. Geoffrey Hayes, Stephanie Pretel, Alexander Saip, Marylyn
Ritchie, Dana Crawford, Paul K.Crane, Katherine Newton, Rongling Li, Daniel Mirel, Andrew
Crenshaw, Eric B. Larson, Chris S. Carlson, Gail P. Jarvik, The electronic Medical Records and
Genomics (eMERGE) Network. Genetic variation associated with circulating monocyte count
in the eMERGE Network. Submitted to Human Molecular Genetics, in revision.
X. Zheng, J. Shen, C. Cox, J. Wakefield, M. Ehm, M. Nelson, B. Weir. HIBAG -
-
HLA genotype
imputation with attribute bagging. (submitted).
SOFTWARE PROJECTS
CoreArray C/C++ library project, developed portable and scalable storage technologies
for bioinformatic data, allowing parallel computing at the multicore and
cluster levels. http://corearray.sourceforge.net/
Two R packages are available online (gdsfmt and SNPRelate) for high-
performance computing on relatedness and principal component analysis in
GWAS
PRESENTATIONS
Platform Talk, American Society of Human Genetics annual meeting (ASHG), San Francisco,
California, Nov 6 10, 2012; Title: HIBAG HLA genotype imputation with attribute
bagging . Presenting author.
Poster, International Congress of Human Genetics (ICHG), Montreal, QC Canada, Oct 11
15, 2011; Title: A High-
Performance Computing Package for Relatedness and Principal
Component Analysis in GWAS ; Presenting author.
Platform Talk, International Congress of Human Genetics (ICHG), Montreal, QC Canada, Oct
11 15, 2011; Title: Somatic mosaicism of large chromosomal anomalies in blood cells of
normal adults ; Contributing author.
Poster, International Congress of Human Genetics (ICHG), Montreal, QC Canada, Oct 11
15, 2011; Title: Genetic variation that predicts white blood cell count differential leukocyte
types in the eMERGE Network ; Contributing author.
Poster, 8th International Conference on Forensic Inference and Statistics, Seattle, WA, Jul
18 21, 2011; Title: How Many SNPs Does It Take To Establish Relatedness? ; Presenting
author.
4/4
Poster, American Society of Human Genetics annual meeting (ASHG), Washington DC, Nov 2
6, 2010; Title: Statistical Prediction of Classical HLA Typing Using Unphased SNP Data ;
Presenting author.
Poster, American Society of Human Genetics annual meeting (ASHG), Hawaii, Oct 20 24,
2009; Title: Quality assurance of genotypic data for genome-
wide association studies ;
Contributing author.
PROFESSIONAL ACTIVITIES AND AFFILIATIONS
Referee for Genetics Research, BMC Bioinformatics
Member of American Statistical Association, American Society of Human Genetics,
ENAR International Biometric Society.
HONORS AND AWARDS
2007 Present: Graduate Study Scholarship at Univ. of Washington Seattle
2008: Department of Biostatistics Pfizer Award at Univ. of Washington Seattle
2007: Department of Biostatistics Pfizer Award at Univ. of Washington Seattle
2005 2007: Graduate Study Scholarship at Univ. of Texas Dallas