A common situation arises when one wants to estimate the effect of a treatment or exposure at some time point t in an observational cohort or randomised trial. For example, what is the mean difference in some outcome Y at time t between the two groups of interest. To make things a bit simpler, let’s suppose that subjects were allocated to the two groups (e.g. two treatments A and B) randomly, as in a randomised trial. Now suppose that some of the subjects die before time t, such that their outcome Y is not observed. Then we can no longer compare Y between the two groups in all subjects, because some values of Y are missing, or truncated by death.
This issue arises commonly in studies of conditions in humans where there is non-negligible mortality during the study, or in studies of elderly populations.
Missing due to death is different
The first thought is to treat the problem as a missing data problem, where Y is missing in those subjects who do not survive to time point t. However, as has been noted by many authors, these values are not missing in the usual sense (values that exist, were intended to be measured, but for some reason were not measured), since they do not exist. We should therefore be cautious about using statistical techniques that attempt to impute these unobserved Ys or are equivalent to imputing them (e.g. fitting a linear mixed model to all subjects).
A great paper which gives an overview of the possible analysis approaches and target parameters (estimands) is given by Kurland et al.
A invalid approach
The most obvious approach is to compare the outcome Y in those who survived in treatment group A to the outcome Y in those who survived who were assigned to treatment group B. The problem here is that if treatment has an effect on survival, the groups of survivors in groups A and B will not be comparable in respect of both measured and unmeasured baseline variables. That is we have lost the benefit of randomisation. A nice paper by Chiba and VanderWeele discusses this.
Principal stratification for truncation by death
One approach to solving the problem is to use principal stratification. Principal stratification has been proposed as an approach to thinking about many ideas, such as non-compliance and truncation by death. I recommend reading a very nice paper on the application of principal stratification to the truncation by death problem by Rubin for more details. The idea when applied to the problem of truncation by death is relatively simple, even if the implementation isn’t necessarily. The idea is as follows:
1) a causal effect is first defined at the individual level as a contrast between their potential outcomes for Y at time t under assignment to treatment A and to treatment B, which we denote Y(A) and Y(B).
2) this contrast is only well defined in those subjects for whom Y(A) and Y(B) is well defined, and this is the sub-population of subjects who would survive to time point t under assignment either to treatment A or to treatment B. This is one of the so called principal strata, where strata membership here is defined by survival status to time t.
3) therefore, if we want to estimate the effect of A vs B on Y at time t, this is only a well defined estimand in the principal strata who would survive to time point t whether we assign them to treatment A or to treatment B.
4) because principal stratification membership is determined by the potential outcomes Y(A) and Y(B), and these are independent of randomized group, those randomised to A and those randomised to B are balanced in respect of all measured and unmeasured baseline variables – i.e. we retain the benefit of randomization.
The question then remains: how do we estimate the effect of treatment in this so called principal strata? We need to compare the observed outcome Y between those randomised to A and those to B, but only using those subjects who are in the strata who would survive under assignment to either A or B. If we observe that a subject did not survive, we know they cannot be in this strata. If they survived, they are in this strata if they also would have survived had they been randomized to the other treatment (i.e. the treatment they were not in fact randomized to).
Without further assumptions, unfortunately the effect in this principal strata is not ‘identified’, or put another way, we cannot estimate it. Various approaches have thus been proposed which make assumptions that enable us to estimate the effect – see references 1-16 in the paper by Chiba and VanderWeele.
Exploiting baseline covariates
One approach that seems quite appealing to me was proposed by Hayden et al in 2005. It makes use of baseline covariates X, and makes the assumption that the survival status for each subject under assignment to A, D(A) or to B, D(B), are conditionally independent given X. It also assumes that conditional on a subject surviving when randomized to A, and X, that their survival status under B is independent of Y(A), and vice versa (swapping A and B). To make these assumptions more plausible we should collect and use baseline covariates X which are strongly predictive of survival.
To estimate the effect in the principal strata is then remarkably simple:
1) fit a logistic regression model in those randomized to A where survival=1 and death=0, with X as covariates. Use this model to calculate a fitted probability of survival for all subjects, denoted P_A.
2) do the same in group B, and generate fitted probabilities of survival, denoted P_B.
3) calculate the mean of Y in those who survived and were in group A, weighting each subject’s contribution by P_B. Calculate the mean of Y in those who survived and were in group B, weighting each subject by P_A. The difference in these weighted means is an estimate of the effect of treatment on Y in the principal strata who would survive under assignment to A or B.
The intuition for the estimate is as follows. To estimate the mean of Y under assignment to A in those who would survive to time t under assignment to either treatment, we only want to include those subjects who we think are quite likely to have survived had they been randomized to B. Thus subjects with high values of P_B are given a large weight, while those we predict to have a lower chance of surviving to time t under randomization to B are given a lower weight. To estimate the mean of Y under assignment to B, we do the same, but now weight using P_A.
As well as the two assumptions stated earlier, we are also relying on the two logistic regression models being correctly specified. Although the approach makes strong assumptions, I find it appealing because the assumptions are fairly easy to understand, and the estimator is relatively simple to implement.
Note: to calculate standard errors and confidence intervals, Hayden et al give expressions for the sandwich standard error. Alternatively one could easily bootstrap the procedure.