Post Job Free
Sign in

Data Science Scientist

Location:
Spokane, WA, 99223
Salary:
Open
Posted:
December 31, 2024

Contact this candidate

Resume:

MELVIN OTT, Ph.D.

509-***-****

PROFESSIONAL SUMMARY

Update 2022-2024

Currently in the MS Data Science program at Northwestern University teaching the Capstone course and the Supervised Learning course. Supervised learning is a core course for model building in AI using logistic regression. Integrating AI in the Supervised Learning course during the summer of 2024.

Student projects in the Capstone course:

• Movie Recommendation System

• Fallen Angel Bankruptcy Predictions

• Machine Learning Chatbot

• Predicting Customer Churn in Financial Industry

• Power Demand on Grid from Electric Vehicles

• Pricing Optimization in the Retail Industry

Supervised Learning Course:

• Linear Regression where the response variable is continuous.

• Other methods where the response variable is Poisson, binomial, or gamma.

• Course description included.

Data Science Consulting:

• Calculated lost account value for seven class action legal cases where the transactions numbered from 60 million to over 100 million records. Each record contained over 50 columns of data.

• When someone asked what I did today I could respond that I did over 100 million things. And sometimes did it many times. Select Accomplishments

• Approved and testified as an expert witness in Statistics for both State and Federal Court in Civil and Criminal cases.

• Delivered paper at JSM 2010 in Vancouver BC, “Simulation for Golf Resort Overbooking”.

• Delivered paper at International Conference on threshold levels for logistic regression in health care and in banking in May and October 2006.

• Statistical consultant for high profile legal case involving large volume of health care claims data.

• Demonstrated Consistent Budget Control & Project Performance by Receiving Salary Bonus for 1994, 1995, 1998, 1999, 2000, 2001, 2002 and 2004.

• Utilized conjoint analysis to identify the best design features for an AIDS prevention product.

• Developed inpatient hospital payment models for the design of State Workers Comp payment, in Georgia, Vermont, New Jersey, and Nebraska. Delivered the models and testified in open forums.

• Designed & implemented cost containment program for physician and hospital payment that resulted in savings of $10M/yr.

Honors & Awards

• Appointed by the governor for the State of Washington, Regional Coordinator for the super-conducting supercollider bid.

• Selected by Member Vote as the Publications Officer for ASA Health Policy Statistics Section 1998-1999.

• Selected to serve on Executive Committee Eastern Oregon University Foundation.

Youtube video at https://www.youtube.com/watch?v=_lKHxedpGYY&t=216s Link to a shiny R application:

https://melvin.shinyapps.io/SScatter/

PROFESSIONAL BACKGROUND

Melvin Ott & Associates 6/05-Present

President & CEO

Provide statistical research, predictive modeling and expert testimony to a broad range of clients.

Northwestern University 6/12 – Current

Adjunct Professor

Teach Data Science courses in the MSDS Program at NU. Core course covers general linear models, regression analysis, logistic regression, exploratory data analysis, cluster analysis, principle components analysis and factor analysis. Python, R and SAS programming currently used for instruction. Includes tensorflow and keras. Theses advising for Master’s students:

• “Mining the NFL”, Matthew R Snee, May 2014

• “Analyzing and Aiding Sports Analytics in Television Journalism”, Robert

“Edward” Egros, March 2015

• “The Role of the Shutdown Inning in Major League Baseball”, Charles Gray, May 2014

• “Predicting the Future Success of Minor League Catchers Using Classification Algorithms and Quantitative and Qualitative Data” William Young Gonzaga School of Business, Whitworth College and Eastern Washington University, Spokane, WA 1/03 – 6/12

Adjunct Professor

Taught Market Research and the Analysis of Demographic Data and Statistical Design in MBA Program and undergraduate.

Company Confidential 6/05 – 12/05

Statistician Consultant

Researched and wrote two reports regarding lending practices for a large national bank. First report was an aggregate national view while the second report focused on several states.

Reports revealed statistically significant differences by race, gender and ethnicity for high rate loans and loan denials.

Ingenix, Inc., Salt Lake City, UT 12/03 - 6/05

Director/Consultant

Directed and performed several research studies for new product development. Studies utilized modeling with SAS programs running over the web to access Oracle based claims data in the large Ingenix data repository. The study demonstrated the lognormal distribution to assess sample size limits for estimating extreme percentiles of physician charges.

• Designed & implemented SAS interactive testing programs to run over the web that enabled multiple users to conduct the tests.

• Developed inpatient hospital payment models for the design of State Workers Comp payment, in Georgia, Vermont, New Jersey, and Nebraska. Delivered the models and testified in open forums.

• Trained other staff members on SAS Programming & Recovery Services random sampling for claims auditing.

Sacred Heart Medical Center, Spokane, WA 2/96 - 4/03 Manager then Director for Data Administration

Provided the ongoing Kaplan-Meier survival analysis for the kidney transplant program. Projected future demand for hospital services based on statistical, demographic and utilization models.

• Future demand report resulted in a $150M expansion of the medical center. Developed clinical quality improvement analysis for a broad range of projects and special studies using statistical tools such as logistic regression modeling, Fishers exact test and others.

Successfully designed and maintained Hospital Performance Measurement System with standardized reporting and automated statistical significance for a JCAHO Performance Measurement System.

• System was then marketed to other hospitals in the area through Inland Northwest Health Services.

Regence- Blue Shield of Idaho, Lewiston, ID 6/92 - 2/96 Manager, then Director for Managed Care & Reimbursement Directed startup operation of a new managed care product called HealthSense. Designed & implemented cost containment program for physician and hospital payment.

• Resulted in a net savings of $10M/yr.

Designed and implemented reimbursement programs for traditional and managed care programs.

Designed and implemented new reimbursement programs for physicians, and hospitals.

• Payment systems standardized the methodology using RBRVS and DRG’s. Hired and trained the analysts to maintain these new programs using SAS for analytics. Gonzaga University, Spokane, WA 9/87 - 6/92

Assistant Professor Operations Management - School of Business Taught MBA and undergraduate courses in statistical decision-making, predictive modeling, sampling, statistical analysis, and linear programming. Provided consulting to clients in utility industry, industry associations, city government, and a private law firm.

EDUCATIONAL BACKGROUND

• PhD in Statistics/Operations Research from Oregon State University, Corvallis, Oregon

• MS in Mathematics from Utah State University, Logan, Utah

• BS in Math/Science from Eastern Oregon University, La Grande, Oregon CONSULTING & EXPERT WITNESS

• Currently retained and working on class action cases in seven states against a large insurance company – Plaintiff

• Expert witness for Medicare fraud case - Defendants

• Expert witness for overtime case - Defendants

• Expert witness for USDOJ housing discrimination case

• Expert witness for Medicaid/Medicare and fee for service health care claims data

- Defendant

• Expert witness for age and gender discrimination case in Columbus, OH - Plaintiff

• Expert Witness for discrimination case in Grant County, WA - Plaintiff

• Expert witness e-coli case – Defendant

• Expert witness gender discrimination case against a large grocery chain - Defendant

• Expert witness federal court for age discrimination case - Plaintiff

• Expert witness for discrimination case in higher education – Plaintiff

• Expert witness for several other cases settled out of court

• Expert witness for arbitration hearing regarding salary dispute for firefighters

• Statistical Analysis of National Banking Data for 2004 Using National FFIEC Census Demographic Data

• Sample Design & Analysis for Columbia Energy & Environmental Services Richland, WA

• EPA Statistician Certification for Licensing for Statistical Inventory Reconciliation Software for leak detection in Underground Storage Tanks

• Economic Impact report for Washington Aggregate & Concrete Association

• Litigation analysis for scientific and technical issues in mining industry – delivered written and verbal testimony on a national ambient air standard for lead What is Supervised Learning?

By Melvin Ott, PhD

Since many people in business are looking at ChatGPT (some are looking with concern), I decided to introduce Supervised Learning with 'answers from ChatGPT'. See some of the discussion at https://www.wsj.com/articles/artificial-intelligence-generative-ai- chatgpt-kissinger-84512912?mod=Searchresults_pos12&page=1 Links to an external site.

From ChatGPT:

“Supervised learning is a type of machine learning in which an algorithm learns from labeled data to make predictions or decisions about new, unseen data. In supervised learning, the algorithm is given a set of inputs and their corresponding outputs, also known as labels or targets. The algorithm uses this labeled data to learn the relationship between the input and output, so it can make accurate predictions on new, unseen data. For example, suppose you want to build a spam filter for emails. In supervised learning, you would start by gathering a dataset of emails, some of which are labeled as spam and others that are labeled as not spam (ham). You would then train a machine learning algorithm on this labeled data so that it learns the patterns that distinguish spam emails from non-spam emails. Once the algorithm is trained, it can use what it learned to predict whether new, unseen emails are spam or not.

Some common examples of supervised learning algorithms include decision trees, logistic regression, support vector machines, and neural networks. Supervised learning can be used for a wide range of applications, such as image classification, speech recognition, natural language processing, and recommendation systems.”

ChatGPT left out Generalized Linear Model. So, what is Generalized Linear Model

(GLM)?

ChatGPT does better with GLM...

From ChatGPT again...

“A generalized linear model (GLM) is a type of statistical model that extends the linear regression model to accommodate non-normal response variables or non-constant variance. In linear regression, the response variable is assumed to be normally distributed, with constant variance. However, in many real-world situations, the response variable may not be normally distributed and/or may exhibit non-constant variance. The generalized linear model allows for a wider range of response variable distributions, such as the Poisson, binomial, and gamma distributions. Additionally, it allows for modeling the relationship between the response variable and the predictor variables using a link function, which can be used to model nonlinear relationships. The basic components of a generalized linear model are: 1. Response variable: The variable being modeled, which can be continuous or discrete. 2. Predictor variables: Variables that are used to explain variation in the response variable.

3. Link function: A function that relates the expected value of the response variable to the linear predictor of the predictor variables.

4. Distribution: The distribution of the response variable, which can be any member of the exponential family of distributions.

GLMs are used in many fields such as biology, ecology, economics, and social sciences, where the response variable may not follow a normal distribution. They are also useful in cases where the variance of the response variable is not constant or when the relationship between the predictor and response variable is non-linear.”

ChatGPT does a nice job of summarizing the topics of Supervised Learning and GLM. However, the mention of a link function needs further explanation. To explain the link function, I have to give a high-level view of linear regression and the mathematical notation. Preview of GLM and the Link Function

Link function does something very special. It provides a unifying view of the generalized linear model.

First for Linear Regression, the link function will be the identity function η = μ. See bullet points next for the high level view of linear regression followed by the linear regression notation that will define η and μ. For the detailed view of linear regression, see the video in Course Recordings.

High Level View of Linear Regression

• Purpose of linear regression:

o Explain differences

o Predict a value

• Issues with linear regression

o Independent variables are independent

o Strength & direction of association

o p-values

o Confounding

o Limits on prediction

• Linear regression minimizes squared error, SSE

• Adjusted and interpreting coefficients

• Assumptions

• Robust regression procedure and influential observations

• OLS regression results are also robust (So, what is the difference in the use of the word robust?) Robust in linear regression means that you can violate the assumptions and still get good results.

Linear Regression Notation and the Link Function

For a linear regression model, let X1, X2, … Xk represent independent variables in the model. And, let

β 1, β 2 … β k represent the intercept and regression coefficients. Then, set η = β i Xi (η is the Greek letter eta) and let μ = E(Y) where E(Y) is the expected value of Y (Y values for the response variable that you are attempting to predict with your model).

Purpose is to identify a function η = g(μ) that links η and μ. The identity function η = μ is used when the errors (residuals) follow a normal distribution. Meaning no transformation between η and μ.

Other Link Functions

Other link functions (this will make more sense after you work with some of the datasets later in the course) will involve a transformation between η and μ for when the Y data follows a different distribution:

Use logit as the link function, η = ln( μ /(1- μ)) for the logistic regression model when the dependent variable is a binomial random variable.

Use the inverse normal distribution, η = φ

-1

(μ) for the link function when the dependent variable is dichotomous with an underlying continuous scale that corresponds to probabilities. This leads to a probit regression model.

Use the Poisson or negative binomial, η = ln(μ) for the link function when the dependent variable is a count (counts the number of possible outcomes for the random variable). An additional link function is the generalization of the logit link function to more than two categories for the dependent variable. This results in the multinomial logistic model. We will not cover generalization of the logit function in this course. Note, the link function is determined by the random component of the dependent variable. The model explains the systematic part of the dependent variable. The errors or residuals are assumed to have a particular probability distribution that determines the link function. When we get to the second part of the course, you will need to refresh your understanding of the binomial distribution, Poisson distribution, and the negative binomial distribution. In case you would like to know, next are some notes on the Probit Model:

• The term “probit” is short for “probability unit”.

• Originally developed in connection with bioassay to convert the percentage of pests killed by a pesticide to a probability.

• Response variable is 1 or 0. Parameters estimated by maximum likelihood.

• Probit Model is based on the cumulative distribution function of the standard normal.

• Advantage of Probit over logistic is easier to transform directly to probabilities using standard normal tables or software that calculates standard normal.

• Interpret coefficients with both continuous and discrete variables: 1. Compute predicted probabilities for discrete variables at the mean of the continuous variables.

2. Compute predicted probabilities for all cross-classifications of discrete variables at the mean of the continuous variables.

3. Model gives a value that is a z score. To convert this to a probability use Excel to find the probability that Z is less than the value given by the model. Excel =normsdist(z) should give the probability where z is the probit model prediction.



Contact this candidate