Post Job Free
Sign in

Python, machine learning, MapReduce, Spark

Location:
Union City, CA
Posted:
March 08, 2016

Contact this candidate

Resume:

Hanchao Liu

404-***-**** *********@*****.*** https://github.com/hankliu43

TECHNICAL SKILLS

Python (sklearn, NumPy, SciPy, Pandas, mrjob, PySpark, matplotlib, Flask, IPython Notebook) SQL Spark, MapReduce, OpenMP Fortran, Scala, Matlab Bash, Linux Predictive modeling (regression, classi cation), cluster analysis, numerical analysis EXPERIENCE

The Data Incubator Program 2016/1 - Present

A highly selective, intensive, training program in data science for PhDs. (47 fellows from 2043 applicants) PhD and Postdoctoral Researcher 2010/9 - 2015/12

Keywords: predictive modeling, numerical analysis, feature engineering Emory University

Built many-body weighted least square models for complex molecular systems up to arbitrary dimensions, using features such as permutationally invariant polynomials, with roughly 100,000 data points.

Performed non-convex optimization, reduced matrix dimensions using a "divide and conquer" strategy, and solved eigenvalue problems to simulate the IR spectra of water in several phases, unraveled several experimental phenomena which had been debated for decades.

Wrote 20,000 lines of code in Python and Fortran, used OpenMP for parallel computing.

Published 9 papers and presented at 8 national conferences. DATA SCIENCE PROJECTS

A Machine Learning Approach to Design Bike Share System 2016/1 - 2016/2 Keywords: machine learning, optimization, API, visualization, recommendation system

Developed a predictive model for bike ridership based on location features such as accessibility to public transportation and density of public attractions, using 600,000 bike trips data and location data from Google Maps API. Visualized the prediction in a heat map. Maximized the ridership and coverage of the bike station network, using a customized cost function and an iterative optimization strategy. Recommended 200 new bike station locations for the Bay Area Bike Share Program. Linkage Analysis of Wikipedia Articles using MapReduce 2016/1 Keywords: big data, MapReduce, web scraping

Parsed all the Wikipedia pages in English, which contain 15,000,000 pages and 70GB data. Calculated the summary statistics on the number of unique links on a page to other pages, using reservoir sampling. Discovered the top 100 most conceptually connected topics, by self-multiplying the linkage adjacency matrix, implemented using MapReduce.

EDUCATION

PhD in Computational Chemistry Emory University Atlanta, GA 2010/9 - 2015/8 BS in Chemistry Lanzhou University Lanzhou, China 2006/9 - 2010/6 AWARDS

Osborne R. Quayle Award 2014

for excellence in graduate studies (3 awardees from 125 candidates)



Contact this candidate