How many imputations with mice? Assessing Monte-Carlo error after multiple imputation in R

When using multiple imputation to handle missing data, one must, if not immediately, but eventually, decide how many imputations to base inferences on. The validity of inferences does not rely on how many imputations are used, but the statistical efficiency of the inference can be increased by using more imputations. Moreover, we may want our results to be reproducible to a given precision, in the sense that if someone were to re-impute the same data using the same number of imputations but with a different random number seed, they would obtain the same estimates to the desired precision. For a great summary on considerations on how many imputations to use, see the corresponding section from Stef van Buuren’s book.

In this post I provide a small bit of R code which, given a pooled analysis after performing imputation using the mice package in R, calculates the so called Monte-Carlo standard error of the multiple imputation point estimates. Stata has really nice functionality for doing this built into mi estimate.

Read more

Hypothetical estimands – a unification of causal inference and missing data methods

Camila Olarte Parra, Rhian Daniel and myself have just released a pre-print on arXiv (now published in Statistics in Biopharmaceutical Research) in detailing recent work looking at statistical methods targeting so called hypothetical estimands in clinical trials. The ICH E9 addendum on estimands is having a widespread impact on the way clinical trials are planned and analysed. One of the strategies described by the addendum for handling so called intercurrent events is the hypothetical strategy. This is where one hypothesizes of a way in which the trial could be modified such that the intercurrent event in question would not take place. For example, in trials where patients may receive a rescue medication, we could conceive of a trial where such medication were not made available. The goal of inference is then what treatment effect we would have seen in such a modified trial.

In the paper, building on work by others (e.g. Lipkovich et al 2020), we show how causal inference concepts and methods can be used to define and estimate hypothetical estimands. Currently estimation of estimands which use the hypothetical strategy is predominantly carried out using missing data methods such as mixed models and multiple imputation. To do so, any outcome measurements available after the intercurrent event being dealt with using the hypothetical strategy are deleted/ignored, and an analysis using these methods is performed, assuming the resulting missing data are missing at random (MAR). We set out to see how estimation of hypothetical estimands would proceed using the language and machinery from causal inference.

In this post I’ll highlight a few of the things the paper covers.

Read more

Multiple imputation separately by groups in R and Stata

When using multiple imputation to impute missing values there are often situations where one wants to perform the imputation process completely separately in groups of subjects defined by some fully observed variable (e.g. sex or treatment group). In Stata, this is made very easy through use of the by() option. You simply tell the mi impute command what variable (or variables) you want to perform the imputation stratified on. Stata will then impute separately in groups defined by this variable(s), and then assemble the imputations of each strata back together so you have your desired number of imputed datasets.

Last week someone asked me how to do it in R, ideally with the mice package. Compared to Stata, one has to do a little bit more work. One approach is to use the mice.impute.bygroup function in the miceadds package, a package which extends functionality for mice in various directions. If you instead want to do it manually, you can do so by making using of the rbind function within the mice package.

Read more