Hanchao Liu
404-***-**** *********@*****.*** https://github.com/hankliu43
TECHNICAL SKILLS
Python (sklearn, NumPy, SciPy, Pandas, mrjob, PySpark, matplotlib, Flask, IPython Notebook) SQL Spark, MapReduce, OpenMP Fortran, Scala, Matlab Bash, Linux Predictive modeling (regression, classi cation), cluster analysis, numerical analysis EXPERIENCE
The Data Incubator Program 2016/1 - Present
A highly selective, intensive, training program in data science for PhDs. (47 fellows from 2043 applicants) PhD and Postdoctoral Researcher 2010/9 - 2015/12
Keywords: predictive modeling, numerical analysis, feature engineering Emory University
Built many-body weighted least square models for complex molecular systems up to arbitrary dimensions, using features such as permutationally invariant polynomials, with roughly 100,000 data points.
Performed non-convex optimization, reduced matrix dimensions using a "divide and conquer" strategy, and solved eigenvalue problems to simulate the IR spectra of water in several phases, unraveled several experimental phenomena which had been debated for decades.
Wrote 20,000 lines of code in Python and Fortran, used OpenMP for parallel computing.
Published 9 papers and presented at 8 national conferences. DATA SCIENCE PROJECTS
A Machine Learning Approach to Design Bike Share System 2016/1 - 2016/2 Keywords: machine learning, optimization, API, visualization, recommendation system
Developed a predictive model for bike ridership based on location features such as accessibility to public transportation and density of public attractions, using 600,000 bike trips data and location data from Google Maps API. Visualized the prediction in a heat map. Maximized the ridership and coverage of the bike station network, using a customized cost function and an iterative optimization strategy. Recommended 200 new bike station locations for the Bay Area Bike Share Program. Linkage Analysis of Wikipedia Articles using MapReduce 2016/1 Keywords: big data, MapReduce, web scraping
Parsed all the Wikipedia pages in English, which contain 15,000,000 pages and 70GB data. Calculated the summary statistics on the number of unique links on a page to other pages, using reservoir sampling. Discovered the top 100 most conceptually connected topics, by self-multiplying the linkage adjacency matrix, implemented using MapReduce.
EDUCATION
PhD in Computational Chemistry Emory University Atlanta, GA 2010/9 - 2015/8 BS in Chemistry Lanzhou University Lanzhou, China 2006/9 - 2010/6 AWARDS
Osborne R. Quayle Award 2014
for excellence in graduate studies (3 awardees from 125 candidates)