Chicago, Illinois, United States
January 14, 2019

Qinnan Zhu

Chicago, IL


Northwestern University, Chicago, IL Expected Jun. 2019 Master of Science, Biostatistics - Statistical Methods and Practice Track University of Illinois at Urbana-Champaign, Champaign, IL Bachelor of Science, General Mathematics

Bachelor of Science, Applied Statistics GPA: 3.5


Languages R, Python, SQL, SAS, LaTex, Bash

Applications and platforms Linux, Pandas, NumPy, Scikit-Learn, Jupyter, VirtualBox, PyCharm, Eclipse WORK EXPERIENCE

Data Analyst Intern Jun. 2017 - Aug. 2017

Xiangtou Hi-Tech Venture Capital, Changsha, China

Applied Python NumPy and Pandas to prepossess data from local administration: imputed missing values, classified categorical data, normalized numeric factors. Reduced time of running Excel reporting by 50%.

Implemented Natural Language Processing on uncoded consumer reports and texts based on customers’ ratings, deriving specific characteristics of successful products.

Performed exploratory data analysis and generalized visualizations of capital investments in emerging cities using Mat- plotlib and Plotly. Assisted company to develop growth strategies and brainstormed ideas in target cities. Business Analyst Intern Jul. 2016 - Aug. 2016

China Construction Bank, Changsha, China

Gathered large scale data from database by writing MySQL queries which helped the department compose monthly review of financial and operational performance.

Critically evaluated real-world raw data by regression methods and proficiently presented relevant conclusions and strate- gic visions which increased quarter revenue by 12%.

Communicated with clients and managers while minimizing the number of follow-ups and complaints, thus cutting team’s workload by 30%.


Benchmarking Network Analysis Methods(Bioinformatics) Sep. 2018 - Present Department of preventive medicine, Feinberg School of Medicine Graduate Thesis

Explore systematic comparisons and quantifying concordance across different approaches for pathway analysis on ovar- ian cancer, which could provide useful information for future researchers on methods selection.

Integrate gene expression datasets, including computing pair-wise correlations and pathway level statistics using multiple Bioconductor packages in R.

Prepossess large-scale raw data including data mining, data wrangling, data transformation using R tidyr and dplyr. National Longitudinal Surveys Exploration Jun. 2018

Analyzed the gender wage gap and potential factors that may vary the disparity using NLSY79 datasets with 10k and 300 columns from the Bureau of Labor Statistics.

Performed descriptive and graphic statistics using Seaborn to summarize features of high and low income in genders.

Examined multiple supervised machine learning algorithm(MLA) models in Python by classification methods including SVM, Random Forest, and Naive Bayes. Obtained an accuracy score of 87.6%. BigMart Sales Analysis Mar. 2018

Built a predictive model to find out properties of 1550 products across 20 outlets connected with increasing sales.

Performed data prepossessing in R including data cleaning, missing values imputing, skewed data transformation and categorical variables scaling.

Trained multiple models including Linear, Lasso, Ridge regressions, XGBoost and tuned models with hyper-parameters. Achieved the highest accuracy score of 0.824 by XGBoost, which gave company a better understanding of sales of each product at a particular store.

