Logistic Forest

Location:

Irvine, CA, 92620

Posted:

October 04, 2022

Contact this candidate

Resume:

YAN CHENG

*** **********, ******, ** ***** ************@*****.*** 803-***-****

EDUCATION: Master of Statistics, Texas A&M University, Jan 2021 - Aug 2022, GPA 3.82 CERTIFICATES: SAS Certified Base Programmer

PROJECTS (https://github.com/Yan8866 for more details) Lung Cancer Data Analysis(R): Compared four machine learning models: Logistic Regression, Decision Tree, Pruned Tree and Random Forest and then selected the model with the best interpretability and prediction power- Logistic Regression to identify significant predictors of lung cancer and predicted how likely a person will develop lung cancer with certain features. Volatility Analysis of the Returns of S&P500 (R): Built ARCH (1) and GARCH (1, 1) models to analyze the volatility of the returns of S&P500, and predicted the volatility of returns. Credit Card Fraud Detection (Python): Trained Linear Regression, Decision Tree, Random Forest and Support Vector Machine models with Scikit-Learn package, used the models to predict how likely a transaction is fraud, then compared their prediction error rate. Random Forest has the greatest prediction power.

Insurance Fraud Detection (Python): Used packages pandas, numpy, seaborn, matplotlib, Plotly, missingno, Scikit-Learn and XGBoost to analyze an insurance data set with 1000 entries and 40 variables; trained nine classifiers: SVM, KNN, Decision Tree, Random Forest and so on. Decision Tree has the best prediction accuracy with a test accuracy rate of 0.816 TECHNICAL SKILLS

Using version control system Git and its host GitHub Familiar with big data tools like Hadoop and Spark Using Docker to share work and reproduce project results Using Tableau to build dashboards, visualize and analyze data Supervised & Unsupervised Learning: Logistic Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis, K-Nearest Neighbors, Bagging, Random Forest, Boosting, Support Vector Machines, Principal Component Analysis, Clustering Analysis Experiment design and sampling methodologies

Building predictive models using Frequency Method and Bayesian Method Parametric, semiparametric and non-parametric techniques Spatial data, time series data and categorical data visualization and analysis OTHER SKILLS

Excellent written and spoken communication skills

Stakeholder management: planning, negotiation, problem solving, conflict resolution Organization and time management

PROGRAMMING LANGUAGES: R, Python, SQL, SAS

WORK EXPERIENCE: Classroom Teacher

Sep 2016-Jun 2017: Marlborough High School (MA,USA) Aug 2013-Jun 2015: White Knoll Middle School (SC,USA) Aug 2011-Jun 2013: East Point Academy (SC,USA)

Contact this candidate