Missing data – Page 13 – The Stats Geek

Multiple imputation followed by deletion of imputed outcomes

September 10, 2015 by Jonathan Bartlett

In 2007, Paul von Hippel published a nice paper proposing a variant of the conventional multiple imputation (MI) approach to handling missing data. The paper advocated a multiple imputation followed by deletion (MID) approach. The context considered was where we are interested in fitting a regression model for an outcome Y with covariates X, and some Y and X values are missing. The approach advocated consists of running imputation as usual, imputing missing values in Y and X, but then discarding those records where the outcome Y had been imputed. Instead, the reduced datasets, with missing X values imputed but only observed Y values, are analysed as usual, with results combined using Rubin’s rules.

Substantive model compatible imputation of covariates – smcfcs in R

May 12, 2015 by Jonathan Bartlett

I’m pleased to announce the release of an R package, smcfcs, which implements multiple imputation of missing covariates using substantive model compatible fully conditional specification. As described in a previous post, this is a modified version of the popular fully conditional specification, or chained equations, approach to multiple imputation (e.g. as implemented in the excellent MICE package).

smcfcs is an attractive approach when the outcome or substantive model includes interactions or non-linear covariate effects, or is itself a non-linear model, such as Cox’s proportional hazards model. In these case, it can be difficult, or sometimes impossible, to directly specify an imputation model for partially observed covariates that is compatible with the outcome/substantive model. Such incompatibility can lead to biased estimates, due to mis-specification of the imputation model. smcfcs resolves this potential problem by ensuring that each partially observed covariate is imputed from an imputation model which is compatible with a user specified outcome/substantive model.

smcfcs is available on CRAN in R. It supports linear and logistic regression outcome models, as well as Cox proportional hazards models for censored time to event outcomes. Competing risks outcomes can also be accommodated through specification of Cox models for each cause specific hazard function. A Stata version is all available, and can be installed from within Stata from the SSC archive using: ssc install smcfcs

Including the outcome in imputation models of covariates

May 7, 2015 by Jonathan Bartlett

Multiple imputation has become a popular approach for handling missing data (see www.missingdata.org.uk). Suppose that we have an outcome (dependent variable in our model of interest) Y, and a covariate X. Suppose further that X contains some missing values, and that we are happy to assume that these satisfy the missing at random assumption. Then we might consider using multiple imputation to impute the missing values in X. A natural question that then follows is whether, in the imputation model for X, the variable Y should be included as a covariate? Particularly when Y is a variable measured later in time than X, our intuition may lead us to think that it is inappropriate to use the future information contain in Y when imputing in X. This however, is not the case.