Post Job Free
Sign in

Data State University

Location:
San Diego, CA
Posted:
June 05, 2017

Contact this candidate

Resume:

Jingman (Sally) Shi (Working EAD Available)

Tel: 317-***-**** Email: ************@*****.*** Address: 12642,Elisa Lane, unit 182, San Diego, CA, 92128 Master of Statistics in SDSU with GPA 3.93. Significant hands-on experience with large datasets in predictive modeling and data mining. Data Analyst with 1 year experience. Familiar with a wide variety of statistical software. PROGRAMMING EXPERIENCE

Python (1 year experience): pandas, scikit-learn, matplotlib, regular expression, networkx, numpy

R (2 year experience): rpart, corrplot, MASS, CAR, glmnet, foreach,random forest, ggplot, boot,

SAS (2 year experience): PCA, Discriminant, Factor, CANCORR, cluster,, Hotelling’s t^2, Multivariate Regression.

Java (Entry level): GUI, Jbutton, Jbox, Action Listener, loops and math calculation.

Advanced Excel (More than 2 years): Pivot table, Pivot Chart, lookup, data auditing, consolidation.

SQL (MySQL): familiar with table creation, selection, manipulating, joins, etc. CORE COMPETENCIES

Model Building: Construct proper models: multiple linear regression, logistic regression, Multi-variate regression, nonparametric regression, mixed model, classification trees, regression trees

Model Selection: Backword and forword Selection, step AIC, AIC, BIC, and other methods to make variable selection and model comparison

Model Diagnostics: Residual Plot, identify outliers and influential points, check model assumptions to determine suitability for modeling

Model Validation and Evaluation: K-fold cross validation, MSE, Misclassification Rate, ROC Curve, Confusion Matrix.

Model Improvement: Adjust methods and parameters to optimize model performance

Data Mining: Supervised and Unsupervised Learning, Decision Tree(Classification and Regression Tree), Association Rules, K-Means Clustering, Bagging, Boosting, Random Forest, K-Nearest Neighbors, Multivariate Adaptive Regression Splines

Data Management: data cleaning, web scraping, identify outliers and missing value, merge, filter and sorting data.

Exploratory Data Analysis: understand the structure of datasets. Pattern identification, variable creation and transformation, descriptive statistics, contingency tables, explore correlation between predictors, relationship between response and predictors.

Data Visualization: boxplot, correlation plot, scatterplot, histogram, 3D plot, pie chart, Tableau

Statistical Inference: Point Estimation, Confidence Interval, Statistical Tests

Statistical Tests: ANOVA, MANOVA, two-sample t test, Fisher’s exact test, Normality Test, chi-squared test, etc.

Bio-Statistical Experimental Design: Crossover Design, Randomized Block Design, Factorial Design

Statistical Consulting: Based on clients’ needs, build model and give experimental design suggestion. PROJECT EXPERIENCE

Credit Card Default Risk Prediction: Conducting Logistic Regression in R to do prediction and design risk score

Census Data Analysis: Using R to compare and apply 3 data mining methods: bagging, boosting, random forest

Top 100 Public and Private Universities Comparison: Using SAS to do Factor analysis, discriminant analysis, etc.

Using Regression, Lasso and Bagging to Predict Baseball Player’s Salary: 3 methods model comparison

Prostate Cancer Prediction: Conducting Logistic Regression using R to do prediction

Name Popularity Analysis: Using pandas in Python to do plots and analysis.

Factor Affecting Student’s GPA: Using SPSS to do ANOVA test based on a survey data

San Diego Bird Data Exploratory Data Analysis: large historical data analysis on bird trends in R and Excel WORKING EXPERIENCE

Data Analyst International Student Center, San Diego State University Aug 2016 – May 2017

20+ projects on application to identify data patterns, explore trends, make comparisons to help with decision making.

Data cleaning, report generating, provide sorted and filtered tables for different purposes.

Exploratory Data Analysis to find out important factors, data cleaning and hypothesis testing.

Analyze and Visualize data using Pivot Table and Pivot Chart and Power Point.

Using SPSS to analyze Surveys, find out important factors, and do hypothesis testing.

Work with different divisions. Good communication with supervisor and other staff on a regular basis, EDUCATION

Master’s Degree in Statistics, GPA 3.93/4.0, All A, San Diego State University EDUCATION

Unofficial Transcript

Course Name Title Units Grade

STAT 0510 APPL REGRESSION ANALYSIS 3.0 A

LING 0572 PYTHON SCRIPTING 3.0 A

STAT 0795 PRAC STATISTICL CONSULTNG 3.0 CR

STAT 0670 A ADV MATHEMATICAL STATS 3.0 A

STAT 0670 B ADV MATHEMATICAL STATS 3.0 A

STAT 0680 A ADV BIOSTAT METHODS 3.0 A

STAT 0680 B ADV BIOSTAT METHODS 3.0 A-

STAT 0520 APPLIED MULTIVARIATE ANAL 3.0 A

STAT 0551 A PROB & MATH STATISTICS 3.0 A

STAT 0551 B PROB & MATH STATISTICS 3.0 A-

STAT 0696 STATISTICAL COMMUNICATION 3.0 A-

MATH 0254 INTRO TO LINEAR ALGEBRA 3.0 A

STAT 0720 SEMINAR 1.0 A

STAT 0575 ACTUARIAL MODELING 3.0 A



Contact this candidate