Estimating hypothetical estimands with causal inference and missing data estimators in a diabetes trial

We (Camila Olarte Parra (LSHTM), Rhian Daniel (Cardiff), myself, and David Wright (AstraZeneca)) recently put on arXiv a new paper which explores the use of estimators from both the causal inference and missing data literatures for estimating a so-called hypothetical estimand in a previously conducted clinical trial in diabetes.

When targeting hypothetical estimands, traditionally any data collected after the so-called intercurrent event takes place are set to missing, and missing data methods are used. The most common approach, with a continuous outcome measured repeatedly over time, is to use a mixed model for repeated measures (MMRM). We also explored the use of multiple imputation and inverse probability weighting, both of which can adjust for post-baseline predictors of the intercurrent event and outcome, which renders the missing at random assumption more plausible.

However, it is also possible to exploit data collected after the intercurrent event takes place, offering the potential of estimates with improved precision. This is possible by using methods from causal inference: G-formula, as we described in a 2022 paper, and g-estimation, as described by Lasch et al 2022.

One of the practical issues to overcome when implementing the methods was handling missing data, which inevitably occurs to varying extents. While MMRM and multiple imputation can accommodate this essentially automatically (under a particular assumption), for inverse probability weighting we first multiply imputed the data. For G-formula we used the gfoRmula package in R, which when fitting the models it requires excludes the corresponding records in a so-called complete case analysis fit. An alternative is to implement G-formula using multiple imputation methods, which as a by-product can be used to impute any missing data in the original dataset. To implement G-estimation we also combined it with multiple imputation to handle the missing data.

The paper gives some detailed guidance to aid the decision of which method to use, as well as detailed descriptions of how we implemented each using R.

This work was supported by a UK Medical Research Council grant (MR/T023953/1).

Leave a ReplyCancel reply