G-formula (sometimes known as G-computation) is an approach for estimating the causal effects of treatments or exposures which can vary over time and which are subject to time-varying confounding. It is one of the so called G-methods developed by Jamie Robins and co-workers. For a nice overview of these, I recommend this open access paper by Naimi et al 2017, and for more details, the What If book by Hernán and Robins. In this post, I’ll describe some recent work with Camila Olarte Parra and Rhian Daniel in which we have explored the use of multiple imputation methods and software as a route to implementing G-formula estimators.

# Causal inference

## From DAGs to potential outcomes via Single World Intervention Graphs

Directed acyclic graphs (DAGs) are an exceptionally useful tool for graphically depicting assumptions about causal structure. An accompanying rich theory has been developed which enables one to (for example) determine if there exist sets of variables which if adjusted for would enable estimation of the causal effect of one (the exposure) of the variables in the DAG on another (the outcome). Personally I have found them very useful for thinking about missing data assumptions (see ‘Understanding missing at random dropout using DAGs‘, for example).

Separately, I have similarly found the concepts of potential outcomes (counterfactuals), as used extensively by Jamie Robins, very useful. The concept of defining causal effects as the difference between what one would observe if an exposure is set to one level as opposed to set to another level is extremely intuitive.

For many years I must admit I did not even notice that there was somewhat of a disconnect between DAGs and potential outcomes, in the sense that if you draw a DAG encoding your causal assumptions about a process in the world, the DAG does not contain any potential outcomes. As I noted earlier, DAGs have rules/conditions about when one can estimate the causal effect of an exposure in the presence of confounding, and the potential outcomes framework similarly has conditions sufficient to estimate the causal effect of exposure. But because the DAG doesn’t contain potential outcomes, it seemed difficult to directly connect these two frameworks. In particular, the no confounding or exchangeability assumption in the potential outcome framework can’t seemingly be checked from a DAG, since the DAG doesn’t contain the potential outcomes.

This post is about Single-World Intervention Graphs, which for me felt like a bit of revelation when I discovered them. They allow one to take a DAG and determine what would happen if one were to intervene to set the value of certain variables to certain values. In doing so, potential outcomes emerge into the graph, and enable us to (for example) check the exchangeability assumption. I draw heavily on Hernán and Robins’ Causal Inference book.

## Confounding vs. effect modification

A student asked me today about the differences between confounding and effect modification. In this post I’ll try and distinguish these conceptually and illustrate the differences using some very large simple simulated datasets in R.