How to interpret hazard ratios

Survival analysis of time-to-event outcomes is very commonly performed using Cox’s famous proportional hazards model. The model estimates hazard ratios for the ‘effects’ of covariates. Starting with HernĂ¡n’s ‘Hazard of Hazard Ratios’ paper, hazard ratios have been investigated and critiqued from a causal inference perspective. Following this, Aalen wrote an important paper on whether’s analysis of a randomised trial using Cox’s model yields a causal effect, and there have been a number of more recent papers investigating the issue further. The criticisms and complexity arise due to the definition of the hazard and the presence of so-called frailty factors – unmeasured variables which influence when someone has the event of interest.

I had briefly blogged about this topic before, in particular about the causal interpretation of the hazard ratio when the proportional hazards assumption holds. I’m really pleased to have now (finally!) finished a short expositional article with colleagues Dominic Magirr and Tim Morris about how we think hazard ratios should be interpreted. Using a simple example we review the key issue arising from the effects of frailty, articulate how we think hazard ratios ought to be interpreted, and argue that it should be viewed as a causal quantity. A pre-print of our article is available now on arXiv.

Research Fellow post at LSHTM – machine learning for missing data

We are currently recruiting for a Research Fellow position at London School of Hygiene & Tropical Medicine to work on an exciting new project that will develop machine learning based methods for handling missing data in statistical analyses. The project, funded by the UK’s Economic and Social Research Council, will develop new missing data methods based on recently developments in double or debiased machine learning. The project team includes myself (Jonathan Bartlett), Shaun Seaman at the MRC Biostatistics Unit, and Richard Silverwood from UCL.

The post will be for 3.5 years, and we are accepting applications until 30th September. For further details on the role and to apply, please see the LSHTM jobs site.

The role of post intercurrent event data in the estimation of hypothetical estimands in clinical trials

Clinical trial estimands which make use of the so-called hypothetical strategy target the effect of one randomised treatment compared to another in a scenario where the corresponding intercurrent event does not happen. Historically estimation of such estimands has made use of established techniques for handling missing data, setting any observed data after the intercurrent event to missing.

In the last few years it has been shown that data after the intercurrent event can be used for estimation of such hypothetical estimands, using methods such as G-formula and G-estimation from causal inference. These offer the potential for increased statistical power, but rely on making certain assumptions about how the intercurrent event influences subsequent outcomes. In a new pre-print available on arXiv, Rhian Daniel and I examine further the role of such post intercurrent event data in estimation of hypothetical estimands.

In the paper we:

  • show certain G-formula estimators are identical to certain G-estimators, something which is not obvious from their construction
  • show these estimators can only improve efficiency and power by making additional assumptions not required by estimators (such as imputation missing data estimators) that do not use data observed after the intercurrent event
  • show the gain in efficiency/power will typically be modest, since in most trials the rates of such intercurrent events is usually not too large
  • argue that the additional assumptions necessary will often not be plausible on clinical grounds

As such, we conclude by recommending that estimation of estimands that adopt the hypothetical strategy continue to be based on estimators that do not use data after the intercurrent event occurs. This involves setting any data observed after the intercurrent event to missing and handling the resulting missing counterfactual (no intercurrent event) outcomes using missing data methods, such as multiple imputation or inverse probability weighting.