Variance estimation for reference-based multiple imputation – the debate continues

Reference-based imputation methods have become a popular approach to handling missing data in clinical trials after patients experience what is nowadays referred to as an intercurrent event. Roughly speaking, these approaches impute such missing data in one treatment group (e.g. those in the active treatment group) based to some extent on estimates of parameters from another treatment group (e.g. the control treatment group). The approach was proposed in a paper by my colleague James Carpenter and others in 2013.

Shortly after publication, Seaman et al raised a concern about the approach – that Rubin’s variance estimator gives variance estimates that are larger (on average) than the repeated sampling variance of the treatment effect estimator obtained by using reference-based MI. The source of the discrepancy is that the reference-based imputation models are ‘uncongenial’ with the complete data analysis (which compares mean outcomes between randomised groups).

James and colleagues replied, arguing against the idea of using (estimates of) the true repeated sampling variance of the reference-based MI treatment effect estimator, on the basis that it they have a rather unusual behaviour – when there is more missing data, the variability of the estimator decreases in repeated samples. As such, James and colleagues argued that the repeated sampling variance ought not to be used. They argued that Rubin’s variance estimator remains appropriate to use, because its value (on average) increases as the amount of missing data increases.

Suzie Cro then extended these ideas, showing that Rubin’s variance estimator possesses what was termed an information-anchored property. This proposed that in the context of a missing data sensitivity analysis, the proportion of information about the treatment effect lost due to missing data should be the same in the primary analysis and in a corresponding sensitivity analysis which varies assumptions about the missing data. They showed that reference-based MI methods approximately satisfy this principle, when an MAR based analysis is used as the primary analysis.

In this paper I reviewed the arguments for and against using the repeated sampling variance as opposed to Rubin’s variance estimator in this context, ultimately concluding (personally) a preference for the repeated sampling variance. I came to this view essentially because I think that if we are operating in a frequentist approach, we should use methods that satisfy the usual criteria. In particular, 95% confidence intervals ought to include the true parameter 95% of the time when our modelling assumptions hold. If one uses Rubin’s variance estimator, the coverage of the resulting confidence intervals is typically above 95%. This consequence also translates into a loss of statistical power relative to what it is if one uses the (smaller) repeated sampling variance. The unusual behaviour of reference-based estimators, where the repeated sampling variance decreases as the proportion of patients experiencing the intercurrent event (and hence (usually) having missing data) is though a logical consequence of the reference-based assumptions, and if this consequence is not deemed acceptable, it probably means we don’t fully believe in the assumptions. If we don’t, we should (in my view) alter the assumptions of our analysis.

Recently Suzie, James and others wrote a letter, just published, in response to a paper published by Marcel Wolbers and other colleagues at Roche, plus myself, where we proposed a conditional mean approach to implementing MAR and reference based imputation. In this, although we briefly discussed the issue of variance estimation, we focused on the repeated sampling variance, and estimated this using the jackknife method. In their letter Suzie et al caution regulators and statisticians about using our conditional mean imputation approach and the repeated sampling variance for reference-based MI methods because of the latter’s unusual behaviour (i.e. getting smaller when more patients have the intercurrent event and then (usually) missing data). Alongside their letter, we have published a rejoinder letter, re-articulating our reasons for favouring the repeated sampling variance, but also showing that the conditional mean approach can be used to obtain an information-anchored variance, if this is desired.

As I wrote in my earlier discussion paper, the question of variance estimation in this context is clearly of practical importance, and so I am glad that the topic continues to receive scrutiny and discussion!

Leave a ReplyCancel reply