Including the outcome in imputation models of covariates

Multiple imputation has become a popular approach for handling missing data (see Suppose that we have an outcome (dependent variable in our model of interest) Y, and a covariate X. Suppose further that X contains some missing values, and that we are happy to assume that these satisfy the missing at random assumption. Then we might consider using multiple imputation to impute the missing values in X. A natural question that then follows is whether, in the imputation model for X, the variable Y should be included as a covariate? Particularly when Y is a variable measured later in time than X, our intuition may lead us to think that it is inappropriate to use the future information contain in Y when imputing in X. This however, is not the case.

Read more