The missing at random (MAR) assumption plays an extremely important role in the context of analysing datasets subject to missing data. Its importance lies primarily in the fact that if we are willing to assume data are MAR, we can identify (estimate) target parameters. There are a variety of methods for handling data which are assumed to be MAR. One approach is estimation of a model for the variables of interest using the method of maximum likelihood. In the context of randomised trials, primary analyses are sometimes based on methods which are valid under MAR, such linear mixed models (MMRM). A key concern however is whether the MAR assumption is plausibly valid in any given situation.
A common type of missing data in randomised trials is due to dropout – once a patient drops out their follow-up data is (often) unavailable for the subsequent planned follow-up visits. If missingness is only due to dropout, the MAR assumption can be shown to be equivalent to the following condition: among those patients who had not dropped out by visit t-1, under MAR the probability of dropping out before visit t may depend on data measured at visit t-1 and preceding visits, but given these, does not depend on the possibly unobserved visit t observations (or later observations).
Last year I was asked an interesting question on the missing data Google group that I maintain. I was asked about whether MAR was plausible in a longitudinal trial with a particular setup: consider those patients who attend visit t-1 (and hence have not yet dropped out). Suppose measurements are taken on these patients at visit t-1, and based on these, and possibly past measurements, a decision is made as to whether the patient drops out, or continues in the study.
On the face of it, the missingness caused by dropout would seem to be MAR, since missingness/dropout depends (in a causal sense) only on observed data (data recorded at visit t-1, and possibly earlier visits). Unfortunately, this logic (I believe) is not necessarily sound. Suppose, as will often be the case, that the distribution of outcome at time t differs between those patients who have not yet dropped out of the study and those who have, even after conditioning/adjusting for past data. In this case, because the indicator of whether a patient drops out between visit t-1 and visit t is predictive of outcome at time t, even conditional on the past information, the MAR assumption will not hold.
A somewhat contrived example of such a situation would be where, at visit t-1, each patient who has not yet dropped out tosses a coin to decide if they will now drop out of the study, or not. Next, suppose that if they drop out, they no longer are able to receive the intervention to which they were originally randomised. Lastly, suppose that their outcome following drop out differs in distribution to those who did not drop out, even after adjusting for past measurements. This might be expected because, in contrast to the patients that did not drop out, they are no longer receiving their randomized intervention. Because of this, missingness will be associated with the outcome value at time t, even after adjusting for past data, such that MAR will not hold. This will be the case even though the missingness was generated by a purely random coin toss.
If anyone has thoughts on the above, or thinks there is a flaw in my logic, please add a comment.