Comparing predictive ability of two nested logistic regression models

A very common situation in biostatistics, but also much more broadly of course, is that one wants to compare the predictive ability of two competing models. A key question of interest often is whether adding a new marker or variable Y to an existing set X improves prediction. The most obvious way of testing this hypothesis is to use a regression model, and then test whether adding the new variable Y improves fit, by testing the null hypothesis that the coefficient of Y in the expanded model differs from zero. An alternative approach is to test whether adding the new variable improves some measure of predictive ability, such as the area under the ROC curve.

Read more

Checking functional form in logistic regression using loess plots

When we include a continuous variable as a covariate in a regression model, it’s important that we include it using the correct (or something approximately correct) functional form. For example, with a continuous outcome Y and continuous covariate X, it may be the case that the expected value of Y is a linear function of X and X^2, rather than a linear function of X. For linear regression there are a number of ways of assessing what the appropriate functional form is for a covariate. A simple but often effective approach is simply to look at a scatter plot of Y against X, to visually assess the shape of the association.

Read more

Area under the ROC curve – assessing discrimination in logistic regression

In a previous post we looked at the popular Hosmer-Lemeshow test for logistic regression, which can be viewed as assessing whether the model is well calibrated. In this post we’ll look at one approach to assessing the discrimination of a fitted logistic model, via the receiver operating characteristic (ROC) curve.

Read more