Weiyi(Claire) Gu
**** ****** ******, ***# ***, Tacoma, WA, 98402
Phone: 206-***-**** Email: ******@**.***
SKILLS
Programming languages R,C/C++, Java, Assembly languageVHDL
Platforms & Tools RStudio,CUDA, Eclipse, Linux,, MPI,Spark, SQL Server, Visual Studio, Bioconductor
Research skills Machine Learning, Parallel Computing, Scientific Computing, Bioinformatics, Giving
Research Presentations, Writing Scientific Reports,Statistical computing
EDUCATION
University of Washington Tacoma, WA March 2015 - Sep 2013
Master of Science in Computer Science & Systems
Coursework Highlight:
Special Topics in CSS(Big Data Analysis) 4.0/4.0
Data Mining 3.9/4.0
Distributed System 3.8/4.0
Artificial Intelligence 3.8/4.0
Statistical Computing 3.7/4.0
Nankai University Tianjin, China Sep 2008 - July 2012
Bachelor of Science in Computer Science
PUBLICATIONS
Ensemble Method for Clinical Outcome Prediction for AML Patients Mar. 2015
Manual script under preparation.
PROJECT EXPERIENCES
University of Washington Tacoma, WA
July. 2014 - Present
Ensemble Methods for Predicting AML Clinical Outcome
R, R Studio, Bioconductor, Machine learning
Developed effective feature selection and classification method to predict clinical outcomes in AML patients.
•
Combined prior clinical knowledge to categorize patients into two groups then build the model.
•
• Compared the performance of machine learning tools in R, such as, SVM, random forest, bagging, logistic regression
• Data visualization using xgobi
• Applied dimension reduction techniques such as PCA to high dimensional genomics data
• Our preprocessing method together with ensemble methods yield AUC of 0.94 and BAC 0.85.
Jan. 2014 - Mar. 2014
A Custom Review Analyzer for Food Products
Python Mapreduce, Hadoop; PySpark
Extracted Uni-gram features from custom reviews, tag the custom reviews with sentiments base on Senti-Wordnet.
•
Construct predicting features.
• Implemented Gradient Descent to approximate the helpfulness of the review from the extracted features.
• Benchmarked the performance of Gradient Descent on three different versions,python Mapreduce, PySpark and
python serial, of implementation for the custom review dataset.
Nankai Univesity Tianjin, China
Accelerating HMMsearch within HMMer3 p7-pipeline Nov. 2012 - Jan. 2013
C, MPI, CUDA
Undergraduate Research Assistant - PDPL
Exploited the MPI-CUDA paradigm to accelerate the p7-viterbi algorithm.
•
Integrated the parallel version into the p7-pipeline of HMMer3 package.
•
We observed a moderately better performance from our implementation.
•
WORK EXPERIENCE
University of Washington, Tacoma
Drug Sensitivity and Gene Mutation Analyses Jan 2015 - Mar.2015
R, R Studio
Research Assistant
Analyses of high-throughput drug sensitivity data.
•
• Performed statistical tests to correlate gene mutations and drug sensitivity.
• Applied Bayesian methods to build predictive models using mulitvariate regression techniques.
• Abstract submitted to American Society of Clinical Oncology (ASCO) meeting and European Hematology
Association (EHA) meeting.
DHC Software Co., Ltd Beijing, China
Extracting Common Therapy from Clinical Prescription Jan. 2013 - Apr. 2013
Python, Shell, Segmenters.
Research Engineer
Developed ways to recognize name entities such as patients symptoms and the corresponding prescript
•
medicine.
• Assisted the company to exploit the recognition system to regulate casually structured prescriptions.
• The system is able to extract and categories the 60% of the terms from the prescription.
OTHER ACTIVITIES
Dream 9 Challenge AML Outcome Prediction Jul.2014-Sep.2014
Machine learning tools in R
•
To cooperate in a team consists of graduate students, postdocs and faculty
•
Research presentation at Madigan Army Hospital; Joint Base Lewis-McChord,WA Dec.17, 2014
Presented in front of a multi-disciplinary team consisting of clinicians and biologists
•