Summary statistics after imputation with mice

Someone recently asked how they could calculate summary statistics after performing multiple imputation with the mice package. The first thing to say is that if you are only interested in calculating a certain summary statistic on each of the imputed datasets, this is easy to achieve. You can extract each imputed dataset using the complete() function, and then apply whatever function you would normally use to calculate the summary statistic in question.

In the rest of this post, I’ll consider the situation where you are interested in performing inference for the summary statistic (or functional if you will). That is, if you are interested in say the median in your data, you are interested because ultimately you are interested in the median of the variable in the population (from which your sample data came from). Viewed this way, the summary statistic is an estimator of a population parameter, and so we should apply the usual procedure for multiple imputation: estimate the parameter on each imputed dataset and its corresponding complete data variance, and then pool these using Rubin’s rules. For some quantities (e.g. the mean), this is pretty easy. For others, at least as far as I can see, it requires a bit more work.

Read more