## How many imputations with mice? Assessing Monte-Carlo error after multiple imputation in R

When using multiple imputation to handle missing data, one must, if not immediately, but eventually, decide how many imputations to base inferences on. The validity of inferences does not rely on how many imputations are used, but the statistical efficiency of the inference can be increased by using more imputations. Moreover, we may want our results to be reproducible to a given precision, in the sense that if someone were to re-impute the same data using the same number of imputations but with a different random number seed, they would obtain the same estimates to the desired precision. For a great summary on considerations on how many imputations to use, see the corresponding section from Stef van Buuren’s book.

In this post I provide a small bit of R code which, given a pooled analysis after performing imputation using the mice package in R, calculates the so called Monte-Carlo standard error of the multiple imputation point estimates. Stata has really nice functionality for doing this built into mi estimate.

## Hypothetical estimands – a unification of causal inference and missing data methods

Camila Olarte Parra, Rhian Daniel and myself have just released a pre-print on arXiv (now published in Statistics in Biopharmaceutical Research) in detailing recent work looking at statistical methods targeting so called hypothetical estimands in clinical trials. The ICH E9 addendum on estimands is having a widespread impact on the way clinical trials are planned and analysed. One of the strategies described by the addendum for handling so called intercurrent events is the hypothetical strategy. This is where one hypothesizes of a way in which the trial could be modified such that the intercurrent event in question would not take place. For example, in trials where patients may receive a rescue medication, we could conceive of a trial where such medication were not made available. The goal of inference is then what treatment effect we would have seen in such a modified trial.

In the paper, building on work by others (e.g. Lipkovich et al 2020), we show how causal inference concepts and methods can be used to define and estimate hypothetical estimands. Currently estimation of estimands which use the hypothetical strategy is predominantly carried out using missing data methods such as mixed models and multiple imputation. To do so, any outcome measurements available after the intercurrent event being dealt with using the hypothetical strategy are deleted/ignored, and an analysis using these methods is performed, assuming the resulting missing data are missing at random (MAR). We set out to see how estimation of hypothetical estimands would proceed using the language and machinery from causal inference.

In this post I’ll highlight a few of the things the paper covers.