Sign in

Python R SAS SQL Tableau Hadoop Minitab Microsoft Access, Word

Ithaca, New York, United States
February 28, 2018

Contact this candidate


Xinyi Liu

Ithaca, NY ***** 860-***-****


A self-motivated statistics graduate with strong programming, analytical, problem-solving, and communication skills. Experienced in data analysis, data visualization, interpersonal communication, and multitask under pressure in a dynamic environment. Able to provide employers with proper statistical analyses in a timely manner and communicate results to clients.


Cornell University, Ithaca, NY

Master of Professional Studies in Applied Statistics, Data Science GPA:4.12 Expected May 2018

University of Connecticut, Storrs, CT

Bachelor of Science, Statistics, Minor in Mathematics, Dean’s List GPA:3.71 May 2017


Computer: Python; R; SAS; SQL; Tableau; Hadoop; Minitab; Microsoft Access, Word, Excel(VBA), PowerPoint; Photoshop

Language: Mandarin Chinese, Japanese Certifications: SAS Certified Base Programmer for SAS 9


Chinese Academy of Sciences, Beijing, China

Data Analyst Intern December 2017-January 2018

Tracked real-time data, manipulated large data sets on relational databases, retrieved records by writing complex SQL queries.

Quickly learned contextual knowledge, fitted predictive models in R to train and predict the value of PM2.5(atmospheric particulate matter that have a diameter of less than 2.5 micrometers) and its components.

Utilized machine learning and predictive analytic techniques, such as stepwise selection, lasso regression, random forests, PCA, to manipulate large structured and unstructured data sets, and to design, validate, implement models.

Investigated issues, found optimal solutions with flexibility, proper judgment, and high efficiency.

Translated quantitative analyses and findings into accessible visuals for non-technical audiences using data visualization tools such as Tableau and ggplot toolsets in R, to provide a clear view into interpreting the data.

Collaborated with the team to develop novel insights and strategic perspective of transporting mechanisms in air pollution based on knowledge of atmospheric physics and statistics using strong communication skills.


Statistical Consulting Research Project with Trinity Partners January 2018 - May 2018

Approached real-world healthcare problems in a quantitative and qualitative manner, partnered with statisticians to understand their needs, thus drove the optimal solution by developing innovative algorithms that are used in analyses of commercial data.

Performed data manipulation with large claim level data sets containing over 10 million records by writing complex SQL queries in SAS and SQL Developer, thus retrieved necessary information and created modeling data sets.

Established scalable, efficient, automated processes to identify predictors of death using statistical programming skills.

Collaborated with statisticians and business partners to develop predictive analytic solutions that enable data-driven strategic decision-making using critical thinking, interpersonal, and data presentation skills.

Data Mining & Machine Learning Project on Pokémon November 2017 - December 2017

Collaborated with three group members, cleaned, organized, and visualized large data sets containing Pokémon's' information.

Used data mining and machine learning algorithms such as logistic regression, random forests, XGBoost to perform classification, feature selection and prediction, designed a stacking process to train and test data, thus minimized error rates.

Research on the Finance of US Public School Systems November 2017 - December 2017

Operated on multiple relational databases to query and derive meaning from large data sets with information about over 14 thousand US public schools with their financial records by writing advanced, complex SQL queries.

Validated the results in different data management tools such as Oracle's SQL Developer and SAS.

Statistical Analysis of Students' Grades with Python June 2017

Input, optimized, and visualized data sets with toolsets in Python, defined classes and functions to sort students' scores and output deserved results, such as the pair of the median score with students' names who received the median score.

Contact this candidate