# Conditional randomization, standardization, and inverse probability weighting

In a previous post, I began following the developments in Miguel Hernán and James Robins' soon to be published book, Causal Inference. There I gave an overview of the first topics they cover, namely potential outcomes, causal effects, and randomization. In this post I'll continue, with some personal notes on the remaining parts of Chapter 2 of their book, on conditional randomization, standardization, and inverse probability weighting.

Conditional randomization
In simple randomization in a trial, each unit or individual is randomized to one of (usually) two groups, entirely randomly. I'll refer to the two groups as the control and treatment group. In a conditionally randomized trial or experiment, we first stratify or split the sample according to a baseline stratification variable (or variables) $L$. In each of these strata, we randomize individuals to the control and treatment groups, and furthermore we may choose to use different randomization probabilities in the different strata. For example if $L$ is binary, we may randomized individuals with $L=0$ to the treatment group with probability 0.75, but in those with $L=1$ we may randomize to the treatment group with probability 0.25 (for example).

For a regular, marginally randomized trial, Hernán and Robins make the important connection that this is equivalent to the potential outcomes being missing completely at random. In a conditionally randomized trial, the potential outcomes are instead missing at random, meaning that the probability that (say) the potential outcome under treatment is observed is independent of that potential outcome value, conditional on the variable(s) $L$.

At this point you may reasonably wonder why we would use such a randomization scheme! Well in reality, such a conditional randomization scheme would not often be used. As we shall see in the following, the randomization scheme makes things somewhat more complicated if we want to draw inferences about causal effects in the total (unstratified) population. Why then are we talking about conditional randomization? Because when we come to observational studies, where treatment is not randomized, to make progress we will pretend that the observational study is like a conditionally randomized trial. Here we will assume that whether or not an individual receives the treatment (rather than the control) is random, conditional on confounders $L$.

If we have data from a conditionally randomized trial, how can we estimate population causal effects?

Standardization
The first approach described by Hernán and Robins is standardization. Suppose we are interested in estimating the average causal effect $E(Y^{a=1})-E(Y^{a=0})$ (we could be interested in another causal effect measure, e.g. the causal risk ratio). To estimate $E(Y^{a=1})$, we can use the law of total expectation to write

$E(Y^{a=1}) = \sum_{l=0,1} E(Y^{a=1}|L=l) P(L=l)$

To estimate $E(Y^{a=1}|L=l)$, we can use the observed risk $E(Y|A=1,L=l)$ among those with $L=l$ and who were randomized to treatment. That these two are identical holds because within each level of $L$, we have a regular, marginally randomized trial (albeit with a randomization probability that in general need not equal 0.5). The probability $P(L=l)$ can then be easily estimated by the sample proportion of individuals who have $L=l$.

Inverse probability weighting
To estimate the potential outcome mean under treatment, $E(Y^{a=1})$, we might, as in a regular marginally randomized trial, consider using the mean of the observed outcome among those randomized treatment, $E(Y|A=1)$. This however would be a biased estimate. Why? Because we have randomized to treatment with different probabilities depending on the value of the variable $L$, those randomized to treatment have a different distribution of $L$ to those randomized to control. If $L$ affects the outcome too, we would obtain a biased estimate due to confounding by $L$.

How then can we validly estimate the mean outcome under treatment? The second approach described by Hernán and Robins is inverse probability weighting. In the inverse probability weighting approach, we can calculate the mean of the observed outcomes among those who were treated, but weighting by the reciprocal of the probability that each of these individuals would have been treated. Earlier, we said that in our conditionally randomized trial, we might randomize those with $L=0$ to the treatment group with probability 0.75, but in those with $L=1$ we may randomize to the treatment group with probability 0.25. In this case, when calculating the mean of the observed outcomes to estimate $E(Y^{a=1})$, we would weight those with $L=0$ by $1/0.75$ and those with $L=1$ by $1/0.25$. In essence, by inverse probability weighting, we assign weights to those who were treated such that their distribution of $L$ is the same as the population distribution of $L$. By doing this we obtain a consistent estimate of the population mean potential outcome under treatment, $E(Y^{a=1})$.

We can then do the same among those randomized to control, in order to estimate $E(Y^{a=0})$, where here we weight by the reciprocal of the probability of being randomized to control. This weighting ensures that the weighted sample of control individuals have a distribution of $L$ which matches the population distribution of $L$. Now we have valid estimates of $E(Y^{a=0})$ and $E(Y^{a=1})$, we can calculate the causal effect measure of interest, e.g. $E(Y^{a=1})-E(Y^{a=0})$.