Randomized clinical trials often involve some sort of clustering. The most obvious is in a cluster randomized trial, where clusters form the unit of randomization. It is well known that in this case the clustering must be allowed for in the analysis. But even in the common setting where individuals are randomized, clustering may be present. Perhaps the most common situation is where a trial involves a number of hospitals or centres, and individuals are recruited into the trial when they attend their local centre. Another example is where the intervention is administered to each individual by some professional (e.g. surgeon, therapist), such that outcomes from individuals treated by the same professional may be more similar to each other. In both of these situations, an obvious question is whether we need to allow for the clustering in the analysis?
This post draws heavily upon the findings of two excellent papers: 1) “Does clustering affect the usual test statistics of no treatment effect in a randomized clinical trial?” by Parzen, Lipsitz and Dear (accessible here), and 2) “Assessing potential sources of clustering in individually randomised trials” by Kahan and Morris (accessible here).
Examples of clustering in trials
As mentioned earlier, cluster randomized trials, in which the cluster itself is the unit of randomization, is an obvious example of clustering. By clustering we mean that individuals belong to clusters, and because of this (usually) are more similar to each other than to individuals in other clusters. An example of this would be a trial which enrolls schools into a trial of two educational interventions. Each school is randomized to receive one of the interventions, and test scores are measured on each child in each school are recorded after a suitable period to assess the efficacy of the interventions. Even in the absence of the interventions, we would expect test scores of children in the same school to be more similar to each other (i.e. they’re correlated) than two children from different school, since two children from the same school may share a number of characteristics by virtue of going to the same school (educational attainment may differ from school to school, for a whole host of reasons). It is well known that in a cluster randomized trial, the clustering must be allowed for in the analysis.
However, as noted in the opening of the post, clustering arguably arises much more frequently in randomized controlled trials than we might at first think. Often clinical trials are conducted by having a number of hospitals or treatment centres agree to participate in the trial. Each centre then recruits patients into the trial. Like the cluster randomized school trial, patients outcomes might be correlated with each other because of factors shared by patients attending the same centre (environmental exposures, socio-demographic background, other health factors). Nonetheless, analyses of such trials would sometimes (perhaps more often than not) ignore this clustering in the analysis. For example, with a continuous outcome, a simple two sample t-test might be performed, comparing the mean outcomes between the two treatment groups.
Another fairly common feature in trials is that either one or both interventions are administered or given by individuals (e.g. surgeons or therapists). In the case of a surgical trial, the proficiency of surgeons might be variable, such that outcomes from two patients operated on by the same surgeon are more similar to each other than the outcomes from two patients who had different surgeons.
When can clustering be ignored in the analysis?
At least within the world of clinical trials, clustering due to the fact that the trial is conducted through centres, or due to the fact that the interventions are administered by different professionals, are quite common. Kahan and Morris note however that often the analyses of such trials ignore these sources of clustering. The question then arises as to whether such analyses are valid.
In their paper, Parzen, Lipsitz and Dear consider the case of trials conducted in a number of centres. They consider two types of randomization. The first, simple randomization (they term it complete randomization), is where each patient is simply randomized to each treatment with probability. The second is permuted block randomization, where within blocks (of a chosen size), exactly half of the patients are given one treatment and the other half the other treatment. The purpose of permuted block randomization is to try to ensure that the proportion of patients randomized to each treatment is close to 0.5 in each centre – if each centre only recruits a small number of patients and simple randomization is used, there is a non-negligible chance that all of a centre’s patients may be randomized to the same treatment.
For continuous outcomes, where a two-sample t-test is used to compare the mean outcome between the two treatment groups (ignoring clustering), Parzen, Lipsitz and Dear show that (asymptotically) the estimated treatment effect is unbiased and the type 1 error rate is correct when simple randomization is used. So in this case, ignoring the clustering in the analysis doesn’t affect the validity of the analysis. Note however that even in this case, allowing for clustering in the analysis (see below for possible approaches to this) will result in a more efficient estimate (more power to detect a treatment effect).
In the case of permuted block randomization, and assuming that outcomes are positively (rather than negatively) correlated within clusters (which would usually be the case), inferences are conservative. That is, if the null hypothesis (no treatment effect) is true, the type 1 error rate of a nominal 5% test will be less than 5%. On the other side of the coin, our power to detect a treatment effect will be reduced, and confidence intervals will be unnecessarily wide.
Thus, in both these cases, ignoring the clustering will at worst lead to conservative inferences. Having said that, given the large investments required, both personal and financial, to conduct a trial, it is arguably the obligation of researchers not to squander precision unnecessarily – i.e. we should try and extract the most precise estimate of treatment effect from our trial.
Parzen, Lipsitz and Dear also argue why the preceding results apply to the case of a binary outcome, which is analysed by a z-test comparing the proportion of ‘positive’ outcomes in the two groups, and also to the log-rank test in the case of a censored time to event outcome.
Kahan and Morris considered the important case of post-randomization clustering, whereby patients are randomized to a treatment and then assigned to a particular professional (e.g. surgeon or therapist). Here the patients are considered clustered within their surgeon (for example). In this case, if the probability of a patient to each surgeon is the same irrespective of which treatment group they have been randomized to, again the clustering can be ignored in the analysis and the type 1 error rate will be correct.
However, if these probabilities of assignment differ between the treatment groups, the clustering cannot be ignored. This would be the case for example if each surgeon only treated patients from one treatment arm. Furthermore, in this case the type 1 error rate will be inflated, leading to rejection of the null hypothesis (when it is true) in more than 5% of trials. Correspondingly, confidence intervals will be narrower than they should be. In this setting, simulations by Kahan and Morris show that the type 1 error rate could be as high as 20%.
Adjusting for clustering in the analysis
When clustering in a trial is non-ignorable (as defined by Kahan and Morris), ignoring clustering in the analysis could lead to invalid inferences. Even when it is ignorable, ignoring it results in (at least to some extent) inefficient estimates of treatment effect. In both cases, a number of possible approaches might be taken to account for the clustering in the analysis.
In the case of continuous outcomes, linear mixed models (sometimes also called random-effects models) can be used (see here for another paper by Kahan and Morris on this in the context of multi-centre trials). In the case of a trial where patients are recruited by centres, a centre random-effect can be included in the model. In the case of post-randomization clustering, e.g. by surgeon, surgeon can be added as a random-effect.
With binary outcomes (see here for a paper on this case) random-effects models can again be used. For binary outcomes however random-effects models are more computationally intensive to fit, and may sometimes encounter convergence problems. Nonetheless, in the context of multi-centre RCTs, this issue occurred rarely in simulations by Kahan. An alternative approach is to use generalized estimating equations, based on an exchangeable correlation structure assumption. Here simulations by Kahan showed that using the model based variance estimator was preferable to the robust sandwich variance estimator, particularly when the number of centres was small. Kahan’s simulation study showed that including centre as a fixed effect sometimes led to inflated type 1 error rates, although it may be a preferable approach when there are only a few centres.
For survival or time to event outcomes, I am not aware of any analyses evaluating the performance of different approaches. However, for the Cox proportional hazards model one can include center either as a fixed or random effect. Alternatively, centre can be ignored, but a robust sandwich variance estimator used.