Post Job Free

Resume

Sign in

Data Scientist Intern

Location:
New York, NY
Posted:
June 06, 2017

Contact this candidate

Original resume on Jobvertise

Resume:

Jingjing Feng

**** ********, *** ****, ** ac0o96@r.postjobfree.com

www.linkedin.com/in/fengjingjing 646-***-****

SUMMARY

To obtain a full-time position as a Data Analyst

Masters Student in Statistics, with proficiency in R, Python, SQL and specialization in machine learning and

statistical modeling with hands-on experience in processing large data sets and managing projects.

EDUCATION

Columbia University, New York, NY Sep. 2015 Dec. 2016

Master of Arts in Statistics (GPA: 3.6/4.0)

l

Wuhan University, Wuhan, China Sep. 2011 Jun. 2015

Bachelor of Science in Statistics (GPA: 3.5/4.0)

l

Top-tier scholarship for academic excellence, Outstanding Student for two consecutive years

l

Cultural Director of Student Union

l

SKILLS

Technical Skills: Python, R, SQL, Tableau, R Shiny, SAS, Unix/Linux, VBA, Neo4j

l

Data Analysis Methods: Machine Learning Algorithms, Data Visualization (ggplot2, plotly), Data Cleaning

l

and Manipulation (dplyr, Pandas), Linear Regression Models, Hypothesis Tests, ANOVA, Time Series Analysis

EXPERIENCE

Predictive modeling for images classification, Columbia University, NY Nov. 2016

Extracted image features from 2000 JPG files via the Caffe library in Python and applied Principal Component

l

Analysis and Random Forest to implement feature selection.

Used Microsoft API to scrape another 2000 images from the internet as testing data.

l

Applied cross validation and achieved a classification error rate of 29% using SIFT features with Gradient

l

Boosting Machine and a 5% error rate using Caffe features with Logistic Regression.

Machine learning algorithms in stock market forecasting, Columbia University, NY Nov. 2016

Scraped and cleaned 3GB financial data from Quandl website using Python. Implemented feature selection

l

using machine learning algorithms such as Random Forest, LASSO and Principal Component Analysis in R.

Built classification models such as Logistic Regression, Support Vector Machine, Gradient Boosting Machine,

l

Random Forest, K-Nearest Neighbors to forecast the trend of stock market.

Citi Bike Navigation a R Shiny app development, Columbia University, NY Oct. 2016

Built an application providing navigation for Citi Bike users using R shiny.

l

Extracted 17 JSON datasets containing real-time Citi Bike station information from the website using Python.

l

Used Google API to recommend nearby Citi Bike stations, routes and estimate the time of journey based on

l

input information (e.g. starting point, destination, when to use Citi Bike and the number of stops).

American Community Surveys, Columbia University, NY Sep. 2016

Conducted exploratory data analysis on the American Community Survey using R. Analyzed the relationships

l

between divorce rates and selected features (e.g. education levels, income, race, industry, and working hours).

Drew javascript graphs using htmlwidgets package in R and presented code and report in GitHub repository.

l

Concluded that individuals with high education level, decent income, appropriate working hours, and jobs in

l

the engineering and education industries tend to have more stable marriages.

Data Scientist Intern EmployToy (A Tech Start-up), New York, NY Jun. Aug. 2016

Designed a graph data model with the query language Cypher in the graph database Neo4j to illustrate data

l

connections in recruiting process.

Assisted the marketing team in generating reports by conducting exploratory data analysis and implementing

l

data visualization on four recruiting metrics using R and Tableau.

Enabled various interactions between the server and Neo4j with the dynamic model in Python.

l

Text Mining, Columbia University, NY Nov. Dec. 2015

Applied Naive Bayes Classification, Tree Classification, and Regularized Logistic models to classify Federalist

l

Papers and distinguished the authors (Alexander Hamilton and James Madison) using R.

Applied cross-validation, feature selection to find the best model that has an error rate of 4%.

l



Contact this candidate