From DAGs to potential outcomes via Single World Intervention Graphs

Directed acyclic graphs (DAGs) are an exceptionally useful tool for graphically depicting assumptions about causal structure. An accompanying rich theory has been developed which enables one to (for example) determine if there exist sets of variables which if adjusted for would enable estimation of the causal effect of one (the exposure) of the variables in the DAG on another (the outcome). Personally I have found them very useful for thinking about missing data assumptions (see ‘Understanding missing at random dropout using DAGs‘, for example).

Separately, I have similarly found the concepts of potential outcomes (counterfactuals), as used extensively by Jamie Robins, very useful. The concept of defining causal effects as the difference between what one would observe if an exposure is set to one level as opposed to set to another level is extremely intuitive.

For many years I must admit I did not even notice that there was somewhat of a disconnect between DAGs and potential outcomes, in the sense that if you draw a DAG encoding your causal assumptions about a process in the world, the DAG does not contain any potential outcomes. As I noted earlier, DAGs have rules/conditions about when one can estimate the causal effect of an exposure in the presence of confounding, and the potential outcomes framework similarly has conditions sufficient to estimate the causal effect of exposure. But because the DAG doesn’t contain potential outcomes, it seemed difficult to directly connect these two frameworks. In particular, the no confounding or exchangeability assumption in the potential outcome framework can’t seemingly be checked from a DAG, since the DAG doesn’t contain the potential outcomes.

This post is about Single-World Intervention Graphs, which for me felt like a bit of revelation when I discovered them. They allow one to take a DAG and determine what would happen if one were to intervene to set the value of certain variables to certain values. In doing so, potential outcomes emerge into the graph, and enable us to (for example) check the exchangeability assumption. I draw heavily on HernĂ¡n and Robins’ Causal Inference book.

Read more

The mean of the residuals in logistic regression is always zero

Last year I wrote a post about how in linear regression the mean (and sum) of the residuals always equals zero, and so checking that the overall mean of the residuals is zero tells you nothing about the goodness of fit of the model. Someone asked me recently whether the same is true of logistic regression. The answer is yes for a particular residual definition, as I’ll show in this post.

Read more