Adjusting for covariate misclassification in logistic regression – predictive value weighting

When we fit regression models, we implicitly assume that the values in our dataset are accurate measurements of the variables of interest. In many settings, the measurements we actually have are imperfect. In the case of a categorical variable, for some of the records in our dataset the observed value may differ from the true value, due to misclassification. Misclassification arises for many different reasons. In epidemiology, instruments are often used to measure conditions imperfectly – sometimes observations which should be recorded as 1 are recorded as 0, and vice-versa. In this post I’ll focus on the common situation where logistic regression is used to model an outcome Y, and one of the covariates is subject to misclassification.

Read more

Area under the ROC curve – assessing discrimination in logistic regression

In a previous post we looked at the popular Hosmer-Lemeshow test for logistic regression, which can be viewed as assessing whether the model is well calibrated. In this post we’ll look at one approach to assessing the discrimination of a fitted logistic model, via the receiver operating characteristic (ROC) curve.

Read more

Deviance goodness of fit test for Poisson regression

In this post we’ll look at the deviance goodness of fit test for Poisson regression with individual count data. Many software packages provide this test either in the output when fitting a Poisson regression model or can perform it after fitting such a model (e.g. Stata), which may lead researchers and analysts in to relying on it. In this post we’ll see that often the test will not perform as expected, and therefore, I argue, ought to be used with caution.

Read more