Hypothetical estimands – a unification of causal inference and missing data methods

Camila Olarte Parra, Rhian Daniel and myself have just released a pre-print on arXiv (now published in Statistics in Biopharmaceutical Research) in detailing recent work looking at statistical methods targeting so called hypothetical estimands in clinical trials. The ICH E9 addendum on estimands is having a widespread impact on the way clinical trials are planned and analysed. One of the strategies described by the addendum for handling so called intercurrent events is the hypothetical strategy. This is where one hypothesizes of a way in which the trial could be modified such that the intercurrent event in question would not take place. For example, in trials where patients may receive a rescue medication, we could conceive of a trial where such medication were not made available. The goal of inference is then what treatment effect we would have seen in such a modified trial.

In the paper, building on work by others (e.g. Lipkovich et al 2020), we show how causal inference concepts and methods can be used to define and estimate hypothetical estimands. Currently estimation of estimands which use the hypothetical strategy is predominantly carried out using missing data methods such as mixed models and multiple imputation. To do so, any outcome measurements available after the intercurrent event being dealt with using the hypothetical strategy are deleted/ignored, and an analysis using these methods is performed, assuming the resulting missing data are missing at random (MAR). We set out to see how estimation of hypothetical estimands would proceed using the language and machinery from causal inference.

In this post I’ll highlight a few of the things the paper covers.

Multiple imputation separately by groups in R and Stata

When using multiple imputation to impute missing values there are often situations where one wants to perform the imputation process completely separately in groups of subjects defined by some fully observed variable (e.g. sex or treatment group). In Stata, this is made very easy through use of the by() option. You simply tell the mi impute command what variable (or variables) you want to perform the imputation stratified on. Stata will then impute separately in groups defined by this variable(s), and then assemble the imputations of each strata back together so you have your desired number of imputed datasets.

Last week someone asked me how to do it in R, ideally with the mice package. Compared to Stata, one has to do a little bit more work. One approach is to use the mice.impute.bygroup function in the miceadds package, a package which extends functionality for mice in various directions. If you instead want to do it manually, you can do so by making using of the rbind function within the mice package.