Post Job Free

Resume

Sign in

Data Service

Location:
Ithaca, NY
Salary:
55000
Posted:
February 05, 2017

Contact this candidate

Resume:

Ben Liu (Ben) Greater New York Area 607-***-**** acyobe@r.postjobfree.com

EDUCATION

Cornell University - Ithaca, NY, United States Aug 2016 - Present Master’s Degree in Data Science, GPA 4.12, Expected Graduation Date: May 2017 Courses included: Big Data Technologies with Hadoop, Machine Learning and Data Mining, Applied Statistical Computation with SAS, Database Design and Management in SAS & Oracle, Advanced R Programming for Data Science, Statistical Programming and Application with Python, Probability Model & Inference, Categorical Data Analysis, Regression Analysis The University of Queensland - Brisbane, Australia Jun 2014 - Jul 2015 Exchange program in Mathematics & Statistics, GPA 4.0 Award: Chinese Government Scholarship, Dean’s Commendation for Academic Excellence, Top Grade Exchange Scholarship Courses: Optimization Theory, Time Series Analysis, Analysis of Scientific Data, Partial Differential Equation, Functional Analysis, Cryptography and Coding Theory, Econometrics, Differential Geometry, Advanced Data Analytics East China Normal University (ECNU) - Shanghai, China Sep 2012- Jun 2016 Bachelor of Science in Mathematics & Applied Mathematics, GPA 3.82 with Rank 2/111 Award: First Class Honor in Bachelor Thesis: Applications of Eigenvalues in Differential Equations, Outstanding Graduate, Top Grade Scholarship (Twice), Honorary Title of Excellent Student, Shanghai Mobile Co., Ltd. Scholarship Courses: Mathematical Modeling, Operation Research, Database and Construction Website, C++ language, Differential Geometry, Complex Analysis, Discrete Mathematics, Ordinary Differential Equation, Analytical Geometries PREFESSIONAL EXPERIENCE

Machine Learning and Data Mining Project Oct 2016 - Nov 2016

- Explored 300,000 gene data and built a regression model to identify important genes and relationships with the target gene

- Applied model selection (Best Subset, Backward Elimination, Stepwise) methods based on statistical measurements (AIC, Cp)

- Utilized shrinkage (Lasso, Ridge) methods to select appropriate variables and validated several models based on test error

- Improved the model’s accuracy by Bagging and Random Forest algorithms in R and visualized the dataset by decision trees IMDB Movie Rating Investigation (Kaggle) Sep 2016 - Oct 2016

- Scraped data from the IMDB website and cleaned the dataset by removing outliers, scaling and imputing miss values

- Built a multiple linear model, performed model diagnostics (homogeneous of variance, normality, linearity), applied Weighted Least Squares method and Box-Cox transformation to make sure the Least Square Assumptions are satisfied

- Investigated multicollinearity between predictors and conducted Principle Component Analysis to reduce high dimension

- Compared the Random Forest and the linear model based on mean squared error and performed out-of-sample prediction Database Management & SAS HPC with Oracle Project Nov 2016 - Dec 2016

- Implemented a database in Oracle based on a big SAS dataset (2 million observations) of the funding source of US schools

- Applied normalization, partitioning and indexed organization techniques to perform logical and physical database design

- Integrated SAS with Oracle by Pass-Through Facility, performed statistical methods (correlation and regression analysis) to explore the influential factors on school funding, wrote SQL queries to extract information and generated statistical reports Statistical Modeling for 311 Service Request in Boston Oct 2016 - Dec 2016

- Scrapped the 311 service request data (1.5 million rows with 32 variables) for the year 2016 from Boston government website

- Established several probabilistic models of service request flow and estimated parameters by MLE and EM algorithm

- Validated the model by nonparametric (Kolmogorov–Smirnov) methods and assessed results using graphical tools (GGPLOT)

- Conducted hypothesis tests, utilized influential tools (ANOVA, MANOVA, Chi-Square Test) to investigate the most recurrent type of service request by time of the day and week across different time zones and relationship between location and time

- Provided recommendations about avoiding call congestion and generated professional paper, statistical results using SAS Project Data Analyst

School of Mathematics and Physics, University of Queensland-Brisbane, Australia Mar 2015 - Jul 2015

- Built a predictive model (logistical regression) in R to estimate whether an individual's earnings are above $60,000

- Performed statistical classification model (decision tree with CART method) and check the robustness of our two models

- Utilized cross-validation to validate the model and improved the model accuracy using random forest method

- Transformed technical languages into non-technical and enhanced interpretability of the data results by visualization (plotting classification trees, ROC/AUC curves etc.)

LICENSES & SKILLS

SAS Certified Base Programmer for SAS 9 Credential (Score : 97) SAS Certified Advance Programmer for SAS 9 (Expected Mar 2017) Intermediate skills: SAS, SQL/MySQL, R, MS Excel (Vlookup, Pivot Table, Macros), Python Basic skills: C++, Java, SPSS, Matlab, Stata, MS Access, Tableau, Lindo, Hadoop, Spark Quantitative skills: Dimension Reduction Techniques (Principle Component Analysis, Factor Analysis), Multivariate Analysis

(Inference, MONOVA, Correspondence Analysis, Profile Analysis), Discriminant Analysis (Distance, Bayesian, Fisher, Stepwise discriminant), Cluster Analysis (Hierarchical Clustering, Dynamic Clustering), Linear/Nonlinear Models, Logistic Regression, Time-Series Forecasting, Decision Tree, Optimization, Simulations, Model Selection and Assessment, Econometrics, Linear/Integer Programming, Lasso and Ridge Regression, Instrumental Variable Regression, Analysis of Variance



Contact this candidate