Jingman (Sally) Shi (Working EAD Available)
Tel: 317-***-**** Email: ************@*****.*** Address: 12642,Elisa Lane, unit 182, San Diego, CA, 92128 Master of Statistics in SDSU with GPA 3.93. Significant hands-on experience with large datasets in predictive modeling and data mining. Data Analyst with 1 year experience. Familiar with a wide variety of statistical software. PROGRAMMING EXPERIENCE
Python (1 year experience): pandas, scikit-learn, matplotlib, regular expression, networkx, numpy
R (2 year experience): rpart, corrplot, MASS, CAR, glmnet, foreach,random forest, ggplot, boot,
SAS (2 year experience): PCA, Discriminant, Factor, CANCORR, cluster,, Hotelling’s t^2, Multivariate Regression.
Java (Entry level): GUI, Jbutton, Jbox, Action Listener, loops and math calculation.
Advanced Excel (More than 2 years): Pivot table, Pivot Chart, lookup, data auditing, consolidation.
SQL (MySQL): familiar with table creation, selection, manipulating, joins, etc. CORE COMPETENCIES
Model Building: Construct proper models: multiple linear regression, logistic regression, Multi-variate regression, nonparametric regression, mixed model, classification trees, regression trees
Model Selection: Backword and forword Selection, step AIC, AIC, BIC, and other methods to make variable selection and model comparison
Model Diagnostics: Residual Plot, identify outliers and influential points, check model assumptions to determine suitability for modeling
Model Validation and Evaluation: K-fold cross validation, MSE, Misclassification Rate, ROC Curve, Confusion Matrix.
Model Improvement: Adjust methods and parameters to optimize model performance
Data Mining: Supervised and Unsupervised Learning, Decision Tree(Classification and Regression Tree), Association Rules, K-Means Clustering, Bagging, Boosting, Random Forest, K-Nearest Neighbors, Multivariate Adaptive Regression Splines
Data Management: data cleaning, web scraping, identify outliers and missing value, merge, filter and sorting data.
Exploratory Data Analysis: understand the structure of datasets. Pattern identification, variable creation and transformation, descriptive statistics, contingency tables, explore correlation between predictors, relationship between response and predictors.
Data Visualization: boxplot, correlation plot, scatterplot, histogram, 3D plot, pie chart, Tableau
Statistical Inference: Point Estimation, Confidence Interval, Statistical Tests
Statistical Tests: ANOVA, MANOVA, two-sample t test, Fisher’s exact test, Normality Test, chi-squared test, etc.
Bio-Statistical Experimental Design: Crossover Design, Randomized Block Design, Factorial Design
Statistical Consulting: Based on clients’ needs, build model and give experimental design suggestion. PROJECT EXPERIENCE
Credit Card Default Risk Prediction: Conducting Logistic Regression in R to do prediction and design risk score
Census Data Analysis: Using R to compare and apply 3 data mining methods: bagging, boosting, random forest
Top 100 Public and Private Universities Comparison: Using SAS to do Factor analysis, discriminant analysis, etc.
Using Regression, Lasso and Bagging to Predict Baseball Player’s Salary: 3 methods model comparison
Prostate Cancer Prediction: Conducting Logistic Regression using R to do prediction
Name Popularity Analysis: Using pandas in Python to do plots and analysis.
Factor Affecting Student’s GPA: Using SPSS to do ANOVA test based on a survey data
San Diego Bird Data Exploratory Data Analysis: large historical data analysis on bird trends in R and Excel WORKING EXPERIENCE
Data Analyst International Student Center, San Diego State University Aug 2016 – May 2017
20+ projects on application to identify data patterns, explore trends, make comparisons to help with decision making.
Data cleaning, report generating, provide sorted and filtered tables for different purposes.
Exploratory Data Analysis to find out important factors, data cleaning and hypothesis testing.
Analyze and Visualize data using Pivot Table and Pivot Chart and Power Point.
Using SPSS to analyze Surveys, find out important factors, and do hypothesis testing.
Work with different divisions. Good communication with supervisor and other staff on a regular basis, EDUCATION
Master’s Degree in Statistics, GPA 3.93/4.0, All A, San Diego State University EDUCATION
Unofficial Transcript
Course Name Title Units Grade
STAT 0510 APPL REGRESSION ANALYSIS 3.0 A
LING 0572 PYTHON SCRIPTING 3.0 A
STAT 0795 PRAC STATISTICL CONSULTNG 3.0 CR
STAT 0670 A ADV MATHEMATICAL STATS 3.0 A
STAT 0670 B ADV MATHEMATICAL STATS 3.0 A
STAT 0680 A ADV BIOSTAT METHODS 3.0 A
STAT 0680 B ADV BIOSTAT METHODS 3.0 A-
STAT 0520 APPLIED MULTIVARIATE ANAL 3.0 A
STAT 0551 A PROB & MATH STATISTICS 3.0 A
STAT 0551 B PROB & MATH STATISTICS 3.0 A-
STAT 0696 STATISTICAL COMMUNICATION 3.0 A-
MATH 0254 INTRO TO LINEAR ALGEBRA 3.0 A
STAT 0720 SEMINAR 1.0 A
STAT 0575 ACTUARIAL MODELING 3.0 A