Multiple imputation (MI) is a popular approach to handling missing data. In the final part of MI, inferences for parameter estimates are made based on simple rules developed by Rubin. These rules rely on the analyst having a calculable standard error for their parameter estimate for each imputed dataset. This is fine for standard analyses, e.g. regression models fitted by maximum likelihood, where standard errors based on asymptotic theory are easily calculated. However, for many analyses analytic standard errors are not available, or are prohibitive to find by analytical methods. For such methods, if there were no missing data, an attractive approach for finding standard errors and confidence intervals is the method of bootstrapping. However, if one is using MI to handle missing data, and would ordinarily use bootstrapping to find standard errors / confidence intervals, how should these be combined?
A very nice paper recently submitted to arXiv, by Schomaker & Heumann investigates this question. They consider a number of possible ways of combining bootstrapping and MI. The two main approaches are either to first impute missing data, and then use bootstrapping to obtain an estimate of the within-imputation SE for each imputed dataset, or, to bootstrap the original data, and apply MI separately to each bootstrapped dataset. The MI point estimate found from each bootstrap is then used to construct bootstrap SEs and confidence intervals in the usual way.
Through both simulation and theoretical considerations, Schomaker & Heumann demonstrate that one should use the latter approach: i.e. to bootstrap the original data, and apply MI to each of the bootstrapped datasets. In contrast, the approach which uses bootstrapping to find a SE for each imputed dataset results in SEs which are much too large and confidence intervals which over cover. I think this is a very valuable investigation and paper, and recommend those who need to combine bootstrapping with MI to read it.
One aspect I cannot quite understand is their explanation for the reason that MI followed by bootstrapping (MI boot) doesn't work. On page 11 they state (correctly I think) that applying the bootstrap to the m'th imputed dataset estimates , the variance of the m'th imputed point estimate of theta, conditional on the observed data and the imputed values in the m'th imputation - i.e. the within-imputation variance of theta. They then write
These estimates are not identical to the variance which we need to apply (3.2). Combining the M estimates is not meaningful because the quantity we use in our calculation is not but rather M different quantities which are all not unconditional on the missing and imputed data.
I don't really understand this - is the total posterior variance of , which is what we want to estimate with Rubin's rules. But the within-imputation variance is supposed to be the average (across the predictive distribution of the missing data) posterior variance of conditional on the imputed and observed data. Thinking of it very directly, for a given set of imputed+observed data, the 'MI boot' approach ought to work if the bootstrap SE for that imputed dataset is close to what the analytical SE is (were we able to calculate it). And I can't see why this wouldn't be the case. If anyone can explain this to me, please add a comment!
27th July 2016 - postscript - please see the comments below, in particular from the authors and R code to perform the simulation study.