Today I listened to a great Royal Statistical Society webinar, with Alan Phillips and Peter Diggle (current RSS president) presenting. The topic was a particularly hot one in the clinical trials world right now, namely estimands.

Alan's presentation gave an excellent overview of the work of a PSI/EFSPI special interest group on estimands. Topics discussed included defining exactly what is meant by an estimand, whether there should be a standardised set of estimands which could be used across trials conducted in different disciplines, and what the estimand discussion means in terms of implementation and statistical analysis.

Listening to Alan I was very much in agreement with everything he said. In particular the importance of being clear in any given trial (or indeed any study) of what the scientific question(s) is, which then leads to consideration of what the target estimand (parameter) ought to be.

## A hypothetical case study

As a case study, Alan discussed a hypothetical randomised trial in diabetes, with two arms, and HbA1C used as an outcome, measured repeatedly over time in patients. In this trial, patients are to be given 'rescue' medication if their HbA1C exceeds a certain threshold.

Alan then supposed that each patient was followed up for HbA1C to the end of the intended follow-up period, irrespective of whether they received the rescue medication. In this case, there are no missing data. This scenario contrasts with what has often been done in trials in the past, where once the patient's treatment in some way deviates from what was intended, (e.g. through treatment discontinuation or receipt of rescue medication), outcome data are no longer collected on the patient.

In the rest of the post, I'll expand a bit on possible estimands using this case study as the running example, and some of the challenges that I think we will face, particularly with so called 'de jure' estimands.

## De facto estimand

In this case (of no missing data), to estimate the effect of randomisation (sometimes referred to as the de facto estimand), we can simply compare the outcomes at the final follow-up time point, for example using a two sample t-test. Importantly, in the absence of any missing actual data, this analysis does not rely really on any strong statistical modelling assumptions, particularly in light of the robustness of the t-test.

## De jure estimands

Depending on the context our primary interest may lie in an alternative estimand. In the present hypothetical example, the de facto estimand is impacted if lots of patients ended up receiving rescue medication, because their outcomes will then likely have been affected by the rescue medication. The de facto estimand tells us what happens to the outcome variable, on average, if you randomise a patient to one arm versus the other, including the effects of all the various things that might happen (e.g. rescue medication, treatment discontinuation, etc).

An alternative estimand that may be of interest might be the effect that would have been seen had the patients not been given rescue medication or discontinued the treatment to which they were randomised to receive. This estimand, sometimes referred to as the efficacy estimand, is trying to target what effect would be seen if the treatment(s) were taken as intended, and no other concomitant treatments were given.

Such 'de jure' estimands may clearly be of interest to various stakeholders in trials, particularly if we are interested in the biological effect of actually taking the treatment. The next question, however, is how might we attempt to estimate the de jure estimand from the observed data. De jure estimands usually implicitly assume the existence of counterfactual (or potential) outcome for each patient, e.g. the outcome that would have been observed at the final follow-up had, possibly contrary to fact, the patient not received rescue medication or discontinued the treatment they were randomised to receive.

In the actual trial that took place however, some patients did receive rescue medication, and so these counterfactual outcomes are missing for at least some patients. Consequently, even though in the trial that took place outcomes were measured on all patients at all time points, once we say we are interested in a de jure estimand, we may again have missing data, but this time missing counterfactual data.

### Assumptions and analysis methods

To handle this missing counterfactual data, we might appeal to the large body of methods that have been developed to handle missing data, e.g. maximum likelihood methods such as MMRM, multiple imputation and inverse probability weighting. All of these methods rely on assumptions, such as assumptions regarding the missingness mechanism and assumptions about the full data distribution.

When we are dealing with missing actual (as opposed to counterfactual) data, we may be uncertain about some of these assumptions, because without seeing the missing data they cannot be fully verified. However, at least here, the data can be considered to exist and be well defined, and our task is to judge the plausibility of assumptions about these well defined but simply unobserved data.

In the case of missing counterfactual data, it seems to me that the task of judging these assumptions is qualitatively harder. The counterfactual outcomes are arguably to a lesser extent well defined. To illustrate, when we say: what would the outcome be had a patient not discontinued a treatment, how exactly do we intend to prevent a patient from discontinuing? Potentially, different ways of preventing discontinuation could lead to different outcome values for the patient, and so arguably the counterfactual outcome is not as well defined as in the case of missing actual data. Consequently, consideration of the plausibility of assumptions is arguably problematic when the quantities involved (the counterfactual outcomes) are not entirely clearly defined.

## Conclusions

The de facto estimand (and corresponding analysis) asks: what happened in the actual trial that took place. In the ideal situation of no missing actual data, a de facto analysis typically can be performed in such a way that minimal statistical assumptions need to be made.

In contrast, de jure estimands (and analyses) amount to trying to predict what would have been seen had the trial been run differently, e.g. patients had not been allowed to receive rescue medication. As such, attempting to estimate de jure estimands is arguably a much tougher challenge. Moreover, statistical analyses that target de jure estimands will it seems to me always have to rely on stronger assumptions than de facto analyses, unless we can design the trial to minimise those events (e.g. receipt of rescue medication, treatment discontinuation) which took place in our trial but, for the purposes of answering our scientific question, we might have wished hadn't.

To be clear, I do think de jure estimands will sometimes (and indeed in some settings often) be the most relevant estimand of scientific interest in a trial. However, I don't think we should underestimate the challenges or assumptions needed to estimate them.

Thanks for another very interesting post. It's interesting that the terms

de jureandde factoare catching on.This is slightly off the main topic of your post, but having read through Alan Phillips’ slides, I’m incredulous that anyone might regard an analysis that estimates a different estimand as a ‘sensitivity analysis’. It’s analogous to asking two (related but) different questions:

What time was your train due?vs.What time did your train actually arrive?Why should the answers be the same? For me, the terms ‘internal’ / ‘external’ sensitivity are then actively unhelpful!

Equally, the term ‘secondary analysis [of a specific outcome]’ is often associated with a secondary outcome rather than a secondary estimand. There must be better terms for these things… can we just decide on them once and for all here and now!

What are your thoughts on what a sensitivity analysis is? Am I using a much stricter definition than most people?

(The above partly repeats what myself, Brennan Kahan and Ian White wrote here)

Thanks Tim. First, I think when presenting Alan did reflect the position you have outlined. Nonetheless, it is the case that sometimes clinical colleagues are interested in how much the answer changes when you change the estimand, and I think that is what that slide content corresponds to.

I do agree with you Tim, but to play devil's advocate: there is nothing in the phrase 'sensitivity analysis' that specifies what aspect you are assessing sensitivity to. One could be statistical modelling assumptions. Another could be the chosen estimand. Now you might say it is nonsensical to ask: if I change the question does the answer change?, but when the questions are closely related (e.g. two estimands that differ in only one respect), maybe this is legitimate.

Thanks Jonathan. I agree with you about the term itself. Generic definitions I've seen tend to talk in terms of ‘how uncertainty in a system's outputs depend on inputs’ (I think it comes from computer science), which is agnostic about what the inputs are. I don’t think definitions exist that are specific to statistical science, much less randomised trials. To join you on the devil's advocate side, you could say, ‘well if no-one receives rescue medication then then estimates of the two estimands coincide, so we just want to know how different they become given that people

didreceive rescue medication’.My concern comes from similar conversations with trialists (statisticians and clinicians) who ‘just want to see if there is a difference’ without engaging with the difference they are actually estimating! It feels a bit

just-press-the-damn-analyse-button-and-tell-me-the-result-when-it’s-done. Given the time and money invested in randomised trials I find this attitude hard to accept. It seems important for investigators to understand estimands and to make an informed decision about whether they would regard changing it as a ‘sensitivity analysis’.Sorry if I’m repeating myself. Thanks again for the post!