# Estimating causal effects from observational studies

In this post I’ll continue my notes based on Miguel Hernán and James Robins’ soon to be published book, Causal Inference. In this post I’ll look at some of what is covered in their third chapter, on estimation of causal effects from observational studies.

Estimating causal effects from observational studies
The natural concern is that if we compare outcomes between those who received the treatment and those that received the control, differences could be at least partly due to inherent differences between the two groups of individuals. Specifically, the groups could differ in respect of other factors (confounders) which themselves affect the outcome in question. This can arise when the decision whether to give the treatment or not depends on characteristics of the individuals in question.

An observational study as a conditionally randomized trial
Hernán and Robins then explain how an observational study can be treated as if it were a conditionally randomized experiment under three conditions. First, that the two (or more) treatments are well defined. Second, that the probability that each individual receives each treatment depends only on the measured covariates, and third that this probability is greater than zero for each treatment and each possible value of the covariates (positivity).

If these conditions can be argued to hold in a particular situation, we may act as if our observational study is a conditionally randomized experiment, and therefore can estimate average causal effects, by inverse probability weighting or standardisation, as described previously. These conditions are referred to as identifiability conditions – conditions which suffice to allow us to estimate average causal effects.

Of course, in practice, we can never be sure whether we have collected information on all variables which affect treatment assignment. This is the assumption of no unmeasured confounders. Thus inferring causal effects from observational studies requires us to make a fundamentally untestable assumption – that we have collected information on all the variables that affect treatment assignment.

On the definition of interventions/treatments
Hernán and Robins next give an extensive discussion of the necessity for interventions to be well defined in order to make causal inferences. They use the example of obesity to elucidate the problems with drawing causal inferences from observational studies where the exposure of interest does not necessarily clearly relate to a specific intervention. In the case of obesity, we may compare future outcomes (e.g. all cause mortality in the 10 years following joining a study) between those who are obese with those who are not obese at baseline.

After adjustment for confounders, we may attempt to attach a causal interpretation to an estimate comparing obese to non-obese individuals. The problem is that there are many possible interventions we could conceive of to modify an individual’s obesity status, and each of these different interventions could each have different effects on the outcome. Hernán and Robins then go on to argue why ambiguity about the interventions corresponding to exposure leads to serious problems justifying the conditions described earlier under which an observational study can be analyzed as if it were a conditionally randomised experiment.

The necessity to clearly define an intervention which corresponds to exposure makes (it would seem at least) attaching causal interpretations to many analyses which are performed problematic. A further example where it would seem more difficult to attach a causal interpretation to exposure estimates is socio-economic status, where again one could think of a plethora of different interventions which might modify socio-economic status, but these interventions may well differ in their effects on the outcome of interest.