Sign in

Python, R, Java, C++, SQL

Durham, North Carolina, United States
January 12, 2018

Contact this candidate

Xilin Cecilia Shi

Address: *** ******** ****, *** ***, Durham, NC, USA Phone: 919-***-**** Email: EDUCATION

Duke University Durham, NC

MS in Statistical Science (GPA: 3.84) June 2018 (expected)

Coursework: Machine Learning, Bayesian Statistics, Statistical Programming, Predictive Modeling, Data Structures and Algorithms, Introduction to Databases, Categorical Data Hong Kong University of Science and Technology (HKUST) Hong Kong BSc in Statistics and Financial Mathematics (First Class Honor, GPA: 3.6), Business Minor May 2016

Coursework: Time Series, Statistical Inference, Regression, Stochastic Modeling, Object-oriented Programming

Honors and Scholarships: Dean’s List, Fung Scholarship, University Scholarship for Undergraduate Students ETH Zurich Zurich, Switzerland

Exchange Program September 2014 - January 2015


Duke Clinical Research Institute Durham, NC

Data Science Intern May 2017 - present

Built neural network models to predict patients’ admission based on Electronic Health Record data and Claims data

Cast survival analysis as a ranking problem, implemented cox proportional hazards model as the baseline model and compared with deep learning methods (multilayer perceptron and deep generative models) using Keras in Python

Helped care managers improve efficiency in resource allocation and reduce cost by identifying factors that contribute to higher risk of admissions

Duke Social Science Research Institute Durham, NC

Consultant January - April 2017

Worked in teams to provide guidance for researchers seeking advice in statistical modeling, such as experimental design, data manipulation, data analysis and visualizations Development of Clustering Algorithms for Ensemble Weather Forecasts Hong Kong Research in Industrial Projects for Students (RIPS) June - August 2015

Performed cluster analysis for the Hong Kong Observatory to compress the massive data generated from ensemble forecasts; investigated 11 clustering metrics such as k-means and hierarchical agglomerative clustering, evaluated their performance and compared the results using R

Tested the applicability of algorithms and robustness of metrics using real-world model output data for high-impact weather situations, and presented insights to the Observatory COMPETITIONS AND PROJECTS

Text Mining on Movie Corpus September - December 2017

Web-scraped a list of 3000 movies on Netflix and their corresponding IMDb reviews and Wikipedia page descriptions

Used natural language processing techniques to preprocess the movie corpus and extract collocations; explored different similarity metrics to measure distances among movies based on the collocations

Proposed a recommendation system based on topic modeling with Latent Dirichlet Allocation (LDA) ASA Datafest March 2017

Discovered customers booking behavior by analyzing hotels information and clicking records from Expedia

Predicted booking activities using random forest, elastic nets and logistic regression, and visualized the analysis Automated Stock Trading using Support Vector Machine February - May 2016

Applied Support Vector Machine (SVM) to predict future direction of stock listed on the Hong Kong Stock Exchange

Conducted feature selection for 12 technical attributes and optimized the hyperparameters for SVM Cash Algo Inter-University Trading Contest January - March 2016

Performed pairs trading for stocks listed on the Hong Kong Stock Exchange and achieved a rate of return of 178.02%

Presented results to the sponsor and obtained bronze medal for highest return and bronze medal for best model concept SKILLS

Python, R, SQL, Java, C++, SAS, Matlab, HTML, LaTeX, TensorFlow

Contact this candidate