Measurement error / misclassification – Page 2

When we fit regression models, we implicitly assume that the values in our dataset are accurate measurements of the variables of interest. In many settings, the measurements we actually have are imperfect. In the case of a categorical variable, for some of the records in our dataset the observed value may differ from the true value, due to misclassification. Misclassification arises for many different reasons. In epidemiology, instruments are often used to measure conditions imperfectly – sometimes observations which should be recorded as 1 are recorded as 0, and vice-versa. In this post I’ll focus on the common situation where logistic regression is used to model an outcome $Y$ , and one of the covariates is subject to misclassification.