A student asked me today about the differences between confounding and effect modification. In this post I’ll try and distinguish these conceptually and illustrate the differences using some very large simple simulated datasets in R.
Confounding
Let X denote a binary exposure of interest, and Y a binary outcome. Let C be a third binary variable. C is a confounder for the effect of X on Y if C causally affects exposure X and also affects Y. Let’s simulate a very large dataset consistent with this causal setup:
set.seed(1234)
n <- 10000000
c <- rbinom(n=n,size=1,prob=0.5)
x <- rbinom(n=n,size=1,prob=0.25+0.5*c)
y <- rbinom(n=n,size=1,prob=exp(-2-1*c+1*x))
We first generate the confounder C. We then generate X with a probability (of being 1) that depends on C, since C affects X. We then generate the binary outcome Y with a probability that depends on the value of both C and X.
To naively estimate the effect of the exposure X on outcome Y, we could estimate the association between X and Y, ignoring C. To do this we must choose an effect measure to use. To begin with, let’s use the risk ratio. We thus calculate the ratio of the risk/probability of Y=1 in the X=1 group to the risk of Y=1 in the X=0 group:
#unadjusted risk ratio
mean(y[x==1])/mean(y[x==0])
[1] 1.698805
This analysis suggests that being exposed increases the risk of Y=1 by approximately 69%. However, we have ignored the confounder C – the association we have just estimated is a combination of the causal effect of exposure X on outcome Y and the so called ‘backdoor path’ between X and Y that goes via C.
There are many different ways to adjust for confounders. This post is not primarily about methods for adjustment, so for now we will use stratification on the confounder C to adjust. To apply this, we re-calculate the risk ratio separately in the C=0 and C=1 groups:
#conditional risk ratio in c=0
mean(y[(x==1) & (c==0)])/mean(y[(x==0) & (c==0)])
[1] 2.720097
#conditional risk ratio in c=1
mean(y[(x==1) & (c==1)])/mean(y[(x==0) & (c==1)])
[1] 2.695138
In this particular case, we see a larger risk ratio compared to the naive unadjusted one once we stratify on C. This difference is due to confounding. Here 2.72 (exp(1) in fact) is the causal risk ratio for the effect of X on Y. The earlier estimated risk ratio is biased downwards in this particular setup because those with X=1 were more likely to have C=1, and the causal effect of C=1 vs C=0 here was to lower the probability that Y=1. Thus the effect of confounding here was to make the apparent effect of exposure smaller than it really is.
Effect (measure) modification
The estimated risk ratios we just estimated in the C=0 and C=1 groups were close. In fact, from the way we simulated the Y variable (using a probability function that is log linear in C and X) we know that the true risk ratio conditional on C=0 is the same as when C=1. In this case we say C is not an effect modifier for the effect of X on Y, because the causal risk ratios conditional on C are the same. Thus in our setup so far, we have confounding, but apparently no effect modification – the effect of X on Y was the same in the two different levels of C.
We chose to use the risk ratio as our effect measure. What if we had instead chosen to use the risk difference? Let’s re-calculate the stratified estimates but now using the risk difference measure:
#conditional risk diff in c=0
mean(y[(x==1) & (c==0)])-mean(y[(x==0) & (c==0)])
[1] 0.2323153
#conditional risk diff in c=1
mean(y[(x==1) & (c==1)])-mean(y[(x==0) & (c==1)])
[1] 0.08506539
We now see a material difference in the causal effect of X on Y between the C=0 and C=1 groups, when we measure the effect using a risk difference. So do we have effect modification or not? The answer is that effect modification is dependent on the choice of effect measure. If you have no effect modification for one measure, you will generally have effect modification on others. Here we simulated data such that the risk ratio was the same in the two strata of C, which implies that the risk differences in the C=0 and C=1 groups must differ. This is why it is probably better to refer to effect modification as effect measure modification (see Chapter 4 of Hernán and Robins).
Effect modification without confounding
The previous simulated example had confounding and effect measure modification for the risk difference. Let’s now re-simulate to demonstrate we can have effect measure modification in the absence of confounding. To do this we will modify the line which generates X so that the probability that X=1 does not depend on C:
set.seed(1234)
c <- rbinom(n=n,size=1,prob=0.5)
x <- rbinom(n=n,size=1,prob=0.5)
y <- rbinom(n=n,size=1,prob=exp(-2-1*c+1*x))
Let’s now estimate the risk ratio again ignoring C:
#unadjusted risk ratio
mean(y[x==1])/mean(y[x==0])
[1] 2.717505
And if we stratify by C:
#conditional risk ratio in c=0
mean(y[(x==1) & (c==0)])/mean(y[(x==0) & (c==0)])
[1] 2.719403
#conditional risk ratio in c=1
mean(y[(x==1) & (c==1)])/mean(y[(x==0) & (c==1)])
[1] 2.714936
Here, in the absence of confounding, we see that the stratified risk ratios are essentially identical to the unadjusted crude risk ratio. But if we re-calculate the risk difference in the C=0 and C=1 groups we have:
#conditional risk diff in c=0
mean(y[(x==1) & (c==0)])-mean(y[(x==0) & (c==0)])
[1] 0.232282
#conditional risk diff in c=1
mean(y[(x==1) & (c==1)])-mean(y[(x==0) & (c==1)])
[1] 0.08537318
This illustrates that it is perfectly possible to have effect measure modification in the absence of confounding.
Adjusting for confounding and collapsibility
So far to adjust for confounding we have calculated our effect measure stratified on the confounder C. In this first example, when we had confounding, the effect measure estimate ignoring C differed materially to the estimates stratified on C. In the second example, where there was no confounding, they differed only slightly (due to sampling variability). Let’s now simulate a third dataset, again without confounding, but with a modified expression for how the probability that Y=1 depends on C and X:
set.seed(1234)
expit <- function(x) exp(x)/(1+exp(x))
c <- rbinom(n=n,size=1,prob=0.5)
x <- rbinom(n=n,size=1,prob=0.5)
y <- rbinom(n=n,size=1,prob=expit(-2-1*c+1*x))
The odds ratio is a popular alternative effect measure for binary outcomes. The odds of an event is simply the probability of it occurring divided by one minus this probability. Let’s estimate the odds ratio for the effect of X on Y, ignoring C:
(mean(y[x==1])/(1-mean(y[x==1])))/(mean(y[x==0])/(1-mean(y[x==0])))
[1] 2.645571
Now let’s estimate the odds ratio separately in the two levels of C:
(mean(y[x==1 & c==0])/(1-mean(y[x==1 & c==0])))/(mean(y[x==0 & c==0])/(1-mean(y[x==0 & c==0])))
[1] 2.715072
(mean(y[x==1 & c==1])/(1-mean(y[x==1 & c==1])))/(mean(y[x==0 & c==1])/(1-mean(y[x==0 & c==1])))
[1] 2.713476
Unlike in our earlier examples, we now see that the stratified effects (measured as odds ratios) are a little bit larger than the unstratified effect. The difference is not due to sample variability, and it is not due to confounding, because we generated X and C here as entirely independent random variables. The explanation is that the odds ratio is not collapsible (see Fine Point 4.3 of Hernán and Robins). An effect measure is collapsible if the overall population causal effect of exposure on outcome is a weighted average of the stratum specific effects. The risk difference and risk ratio are collapsible, but the odds ratio is not.
We can in fact use C to calculate an estimate of the population (marginal) causal odds ratio for the effect of X on Y. One method to do this is standardisation (see Section 2.3 of Hernán and Robins). This involves first estimating the probability of Y=1 were the whole population to be assigned to be exposed, using the observed values of C and estimates of the probability that Y=1 given X=1 and the two different values of C. This probability can then be converted to the marginal odds that Y=1 if everyone were exposed. We then repeat the calculation imagining everyone were unexposed. We then take the ratio of the two calculated odds. In R we can do this as follows:
prx1 <- mean(c==0)*mean(y[x==1 & c==0])+mean(c==1)*mean(y[x==1 & c==1])
prx0 <- mean(c==0)*mean(y[x==0 & c==0])+mean(c==1)*mean(y[x==0 & c==1])
#marginal OR
(prx1/(1-prx1))/(prx0/(1-prx0))
[1] 2.646265
This is an estimate of the marginal odds ratio for the effect of X on Y. It is close to the estimate we first obtained where ignored C. Here there was no confounding, and so there was no need to do this. But if we did have confounding, this procedure would allow us to estimate the marginal odds ratio for the effect of X and Y adjusting for confounding by C. Interestingly, Hernán and Robins suggest that the odds ratio would rarely be the causal parameter of interest:
We do not consider effect modification on the odds ratio scale because the odds ratio is rarely, if ever, the parameter of interest for causal inference.
Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.
Standardisation methods can also be useful in the absence of confounding in other situations. One is randomised trials. Here we can use baseline covariates to improve the efficiency of inferences for the treatment effect (see Zhang et al 2008) or for the mean outcome under each of the treatments (see Bartlett 2018).
For those interested in reading further about collapsibility of effect measures, especially from a causal inference perspective, I recommend reading Daniel et al 2020.
Excellent teaching example Jonathan, thanks. We have another illustrative example for effect modification and collapsibility here: https://github.com/migariane/HETMOR-Causal-Inference
I don’t see why such a huge deal is made about non-collapsibility of the odds ratio. Mathematically, only non-collapsible effect measures can have a zero product term when there is no interaction and collapsibility is just an indicator of the need for a product term to make the model fit the data regardless of the fact that there is no interaction.
In the non-confounder example by Jonathan above the coef of the product term is approximately zero (0.0002) for a log-binomial model but do we interpret this as no modification of the effect of x on y by c? The answer is no because ultimately this depends on the actual effect and not the modeled effect and only the logistic model delivering a coef of 0 can give us that confidence (which in this case it does not suggesting a slight but perhaps clinically negligible effect of c on the effect of x being present). A simple example is one given by Greenland where rows and columns are c and x with probability of y as follows:
P00=0.2, P01=0.4, P10=0.6 and P11=0.8.
A simple reconstruction of the data will demonstrate that the product term coef on the logistic scale is zero and on the log-binomial is -0.405. Does that mean there is effect modification here – of course not and not on any scale.
Finally why is the conditional OR for x =2.67 and the marginal OR=2.25? This is because c is prognostic for y and thus the population effect does not exclude the effect of c and depending on the population distribution of c the OR will be anywhere between the c-devoid value of 2.67 and a smaller value. Thus non-collapsibility of an effect measure is nothing more than systematic error induced by a non-measured prognostic variable and nothing else. To quote Frank Harrell, despite all the amazing work done by causal inference experts their take on collapsibility and odds ratios is something that they have perpetrated on us that requires change for the betterment of medical decision making.