Data Analyst, SAS, SQL

Location:

Dallas, TX

Posted:

June 05, 2016

Contact this candidate

Resume:

Weibin Ye

Dallas, TX ***** / ******.**.****@*****.*** /Cell 469-***-****

LinkedIn: https://www.linkedin.com/pub/weibin-ye/89/972/42a Position: Test Data Analyst

CERTIFICATIONS SKILLS

SAS Certified Business Analyst (No. SBARM002592v9)

SAS Certified Advanced Programmer (No. AP017353v9)

SAS Certified Base Programmer (No. BP059751v9)

SAS Predictive Modeling

SAS SQL MACRO

MS Access Excel

Language: English Chinese Cantonese

EDUCATION

University of Texas at Dallas, Richardson, TX

MS Information Technology and Management. GPA: 3.4

Guangdong University of Finance, China

BS Economics, Finance & Monetary Management GPA: 3.3 May 2016

June 2012

Academic Project

Customer Purchase Behavior Analysis Model SAS 9.4 Spring 2016

Dataset records customer purchases from two competing booksellers – Amazon and B&N in 2007. Customer demographics are included.

Combined BY statement in sort procedure with first.variable/last.variable in data step to group data after cleaning it, then outputted to temporary dataset.

In order to predict amount of book customer will buy, assumed number of book customer bought from both websites was Poisson distributed, and then inputted temporary dataset to Negative Binominal Distribution Regression Model (NBD) and Poisson Regression Model.

Using Maximum-Likelihood Estimation to gain NBD Regression Model and Poisson Regression Model.

Compared two models by using Likelihood Ratio Test and found out that there was no significant difference between two models.

Constructed new variables – percentage of weekend purchases and degree of loyalty – to improve model performance and then reran two models. AIC and BIC were much lower than before.

Assigned 1 to Amazon and 0 to B&N, then built Logistic regression model to find out what factors make customers prefer Amazon to B&N.

Building Plane Ticket Online Booking Predictive Model (Enterprise Miner 13.1) Spring 2016

Adjusted decision weight, set value of true positive two times greater than value of false negative and false positive.

Applied StatExplore to data set for result of variables’ skewness and missing value.

Applied Replacement to replace data entry error and applied Impute to impute median to interval variables and set tree surrogate as class variable’s input method.

Applied Decision Tree and Gradient Boosting to build models DT1 and GB1.

Reduced interval variables’ skewness by choosing log10 method in Transform Variables, and then applied Regression to build regression model Reg1.

Applied Variable Selection and Neural Network to build model NN1.

Connected models NN1, Reg1, DT1 and GB1 to node Model Comparison, then ran process.

To improve model performance, stratified sample data set before being input into models.

Used Bagging to improve model performance, built several decision trees with different random seed numbers and used tool ensemble to find out the best model.

Compared performance of five models and choose champion model.

Contact this candidate