Reference based multiple imputation – what’s the right variance and how to estimate it?

Reference based multiple imputation methods have become a popular approach for handling missing data in the analysis of randomised trials (Carpenter et al 2013). Very roughly speaking, they impute missing outcomes in patients in the active arm assuming that the missing outcomes behave as if the patient switched onto the control treatment. This is in contrast to what is now the standard approach, based on the missing at random assumption, which effectively imputes missing outcomes for patients in a given arm as if they remained on the same treatment as they were randomised to.

Soon after reference based MI methods were proposed, people started noticing that Rubin’s rules variance estimator, which is the standard approach for analysing multiply imputed datasets, overstated the variance of treatment effects compared to the true frequentist variance of the effect estimator (Seaman et al 2014). This means that if Rubin’s rules are used, the type 1 error will be less than the standard 5% level if the null hypothesis is true, and power is lower (sometimes substantially) than if the frequentist variance were used for inference.

In a new pre-print on arXiv I review the congeniality issue and the bias in Rubin’s variance estimator, and summarise some of the arguments made in favour and against using Rubin’s rules with reference based methods. In the end I personally conclude that the frequentist variance is the ‘right’ one, but that we should scrutinise further whether the referenced based assumptions are reasonable in light of the behaviour they cause for inferences. For instance, they lead to a situation where the more data are missing, the more certain we are about the value of treatment effect, which would ordinarily seem incorrect.

I also review different approaches for estimating the frequentist variance, should one decide it is of interest, including efficiently combining bootstrapping with multiple imputation, as proposed by Paul von Hippel and myself a paper (in press at Statistical Science) and available to view here.

I hope the paper stimulates further debate as to what the right variance is for reference based methods, and would very much welcome any comments on it.

19th July 2021 – a short talk about this work can be viewed here.

22nd September 2021 – this work has now been published in the journal Statistics in Biopharmaceutical Research, and is available open-access here.

Non-proportional hazards – an introduction to their possible causes and interpretation

I had the pleasure today to participate in a PSI event on non-proportional hazards and applications in immuno-oncology. Non-proportional hazards are increasingly encountered in clinical trials, and there remain important questions about how to analyse trials when non-proportional hazards could occur. These include questions about how to formulate an appropriate hypothesis test for assessing evidence of benefit of the new treatment over the control and how to best quantify the treatment effect. The talks were really very interesting, and lead me to believe there is still lots of important work to be done in this area.

Here are the slides of my talk in case they are of interest, where I discuss some of the subtleties involved in interpreting changes in hazards and hazard ratios over time, which are complicated by the ubiquitous presence of frailty effects. I’ve posted on this topic previously quite a lot – for those interested see the related posts below.

From DAGs to potential outcomes via Single World Intervention Graphs

Directed acyclic graphs (DAGs) are an exceptionally useful tool for graphically depicting assumptions about causal structure. An accompanying rich theory has been developed which enables one to (for example) determine if there exist sets of variables which if adjusted for would enable estimation of the causal effect of one (the exposure) of the variables in the DAG on another (the outcome). Personally I have found them very useful for thinking about missing data assumptions (see ‘Understanding missing at random dropout using DAGs‘, for example).

Separately, I have similarly found the concepts of potential outcomes (counterfactuals), as used extensively by Jamie Robins, very useful. The concept of defining causal effects as the difference between what one would observe if an exposure is set to one level as opposed to set to another level is extremely intuitive.

For many years I must admit I did not even notice that there was somewhat of a disconnect between DAGs and potential outcomes, in the sense that if you draw a DAG encoding your causal assumptions about a process in the world, the DAG does not contain any potential outcomes. As I noted earlier, DAGs have rules/conditions about when one can estimate the causal effect of an exposure in the presence of confounding, and the potential outcomes framework similarly has conditions sufficient to estimate the causal effect of exposure. But because the DAG doesn’t contain potential outcomes, it seemed difficult to directly connect these two frameworks. In particular, the no confounding or exchangeability assumption in the potential outcome framework can’t seemingly be checked from a DAG, since the DAG doesn’t contain the potential outcomes.

This post is about Single-World Intervention Graphs, which for me felt like a bit of revelation when I discovered them. They allow one to take a DAG and determine what would happen if one were to intervene to set the value of certain variables to certain values. In doing so, potential outcomes emerge into the graph, and enable us to (for example) check the exchangeability assumption. I draw heavily on Hernán and Robins’ Causal Inference book.

Read more