Is stratified randomisation in trials (at least large ones) pointless?

I recently wrote a blog post giving some comments about the revised FDA guidance on covariate adjustment in randomised trials. One of the things I commented on was recent work by different authors on the impacts of stratified randomisation on inference. It is well known (or should be!) that if stratified randomisation is used, the statistical analysis should adjust for the variables used in the stratified randomisation in order for the efficiency gain to be realised.

A fact that was new to me is that if in the analysis one adjusts (in linear models for continuous outcomes) for dummy variables corresponding to each of the randomisation strata, the true variance of the resulting estimator of treatment effect is the same whether simple (non stratified) or stratified randomisation is used, in the most common situation that randomisation to the two arms is 1:1. This was shown by one of the results of Bugni et al 2018. Wang et al 2019 subsequently showed it also holds if additional baseline covariates (not used in the stratified randomisation) are also adjusted for, and that it also holds for the standardisation type estimator of the marginal risk difference based on a logistic regression working model.

These results mean that, at least for large sample sizes, if one considers strata defined by baseline covariates, provided you adjust for these strata as covariates in the analysis, performing the randomisation stratified on these gains you no additional efficiency compared to using simple randomisation. These theoretical results are in agreement with simulation results mentioned on Twitter today by Jack Wilkinson that prompted this post:

The theoretical results in the above papers are asymptotic results. Thus I can imagine it could well be the case that with small sample sizes stratified randomisation does buy you some additional efficiency.

Moreover, in practice I believe it is more common in the analysis model to adjust for only main effects of the variables used to define the randomisation strata, rather than dummy variables for each of their combinations. My guess is that if an assumption that the outcome only depends on these variables via their main effects is correct, the theoretical results mentioned above that imply no (asymptotic) benefit to using stratified randomisation would also hold true for this type of analysis model.

I don’t have time to actively investigate this myself right now, but if anyone is interested in pursuing it and leading on the work for this, please get in touch.

Postscript – a small simulation illustration

The following small simple simulation study in R follows. The setup is very simple – one binary baseline covariate (X) which influences the outcome and either is ignored in the randomisation (simple randomisation) or randomisation is performed stratified on it to ensure balance. In both cases, the analysis is a linear regression adjusting for treatment (Z) and this baseline covariate (X).

library(blockrand)

n <- 250

nSim <- 10000
est <- array(0, dim=nSim)

#simple randomisation
for (sim in 1:nSim) {

  #simulate binary baseline covariate
  x <- 1*(runif(n)<0.5)
  #simulate treatment assignment, using simple randomisation
  z <- 1*(runif(n)<0.5)
  
  y <- x+z+rnorm(n)
  mod <- lm(y~x+z)
  est[sim] <- coef(mod)[3]
}

#look at empirical SE of estimates
sd(est)

#stratified block randomisation

for (sim in 1:nSim) {
  
  x <- 1*(runif(n)<0.5)
  #stratified block randomisation
  z <- rep(0,n)
  n0 <- sum(x==0)
  z[x==0] <- as.numeric(blockrand(n=n0, num.levels=2)$treatment)[1:n0]-1
  n1 <- sum(x==1)
  z[x==1] <- as.numeric(blockrand(n=n1, num.levels=2)$treatment)[1:n1]-1
  
  y <- x+z+rnorm(n)
  mod <- lm(y~x+z)
  est[sim] <- coef(mod)[3]
}

sd(est)

The empirical SE from simple randomisation (based on 10,000 simulations) was 0.1259364 and for stratified randomisation was 0.1254624. This shows that, at least in this setup, the stratified randomisation, does not materially reduce the (true) variability of the treatment effect estimates.

These results are in accordance with a 1982 paper I just came across: ‘A note on stratifying versus complete random assignment in clinical trials’ by Grizzle.

12 thoughts on “Is stratified randomisation in trials (at least large ones) pointless?”

  1. Jonathan, I’ve just skimmed this and haven’t digested yet but at a glance it doesn’t make sense to me, so I’ll write down my confusion…
    You note above that ‘if stratified randomisation is used, the statistical analysis should adjust for the variables used in the stratified randomisation in order for the efficiency gain to be realised’. Exactly! We expect e.g. overcoverage due to a mismatch between the empirical SE and model SE. This doesn’t (I think) go away with large sample sizes*, which implies that stratified randomisation still reduced the empirical SE compared with simple randomisation. And if there were a ‘catching up’ of simple randomisation as n increases then that would surely also lead to overcoverage – which we know doesn’t happen.

    *Figures 1 and 2 of Anthony Atkinson’s paper ‘Optimum biased-coin designs for sequential treatment allocation with covariate information’ spring to mind, where the loss of simple randomisation vs. other methods actually gets worse as n increases.
    https://onlinelibrary.wiley.com/doi/abs/10.1002/(SICI)1097-0258(19990730)18:14%3C1741::AID-SIM210%3E3.0.CO;2-F

    Reply
    • Thanks Tim. What analysis model did you have in mind? I probably wasn’t very clear in the post about what I meant. Imagine you have one binary baseline covariate, and you are going to adjust for it (and treatment group) in the statistical analysis (whichever randomisation scheme you use). Now compare the the true/empirical SE of the estimator under simple randomisation vs. stratified randomisation (stratified on the binary baseline variable). This theory says (I think) asymptotically the empirical SE of this estimator will be the same (and in particular no lower) whether you use stratified randomisation or not. I.e. once you commit to adjusting for it in the analysis model, there is no (asymptotically at least) benefit to using the more complex stratified randomisation.

      Reply
  2. Thanks Jonathan. I can’t reply to your comment above so this is in response to ‘Thanks Tim. What analysis model did you have in mind?’ Yes, I had in mind what you wrote (i.e. both adjusted).

    After a bit of thought, I have some intuition. In Atkinson’s paper above, he quantified loss as ‘number of participants on whom information is lost due to imbalance’. In his figure 1, loss (eventually) reaches ~5 for simple randomisation. But this is an absolute number and the results of Bugni and of Wang are not in the same terms. If they took Atkinson’s loss of 5, the difference in variance with n–5 vs. n participants becomes vanishingly small as n goes to infinity. (I realise this is very hand-wavy!)

    Here’s code showing how this works out for loss = 5:
    # Stata
    twoway function (1/(x-5)) / (1/x), range(10 100)
    # R
    eq = function(x){(1/(x-5))/(1/x)}
    plot(eq(10:100), type=’l’)

    Reply
  3. This is really interesting! I’ve always had the mantra ‘analyse as you randomise’ so if you stratify by a discrete covariate you should include it in the analysis, but because of randomisation within strata there will be little effect of including the covariate in the model on the treatment estimate. (Interestingly with this mantra really one should also include blocks in block randomisation in the analysis, which I must confess I don’t do).
    One assumes the ‘true’ mode includes the covariate but that is an assumption one makes when doing stratified randomisation. Omitting the covariate is a ‘collapsed’ model and not all estimates for treatment effects are ‘collapsible’.
    . What I wondered was about multi-centre trials. I generally suggest stratifying by centre but then wondered about including centre effect in the analysis. If centre effect is included should it be as a fixed effect or a random effect?
    Thanks

    Reply
  4. The theory was covered by Emmanuel Lesaffre and me in

    Lesaffre E, Senn S. A note on non-parametric ANCOVA for covariate adjustment in randomized clinical trials. Statistics in Medicine. 2003;22(23):3583-3596.

    As you note, it’s not quite true that there is no difference. It is asymptotically true. The lower bound for the variance is proprtional to n/(n_1 x n_2) x lambda where n_1 and n_2 are the sample sizes in the two arms and n = n_1+n_2 and lambda is the penalty for loss of orthogonality. For a formula for lambda see

    Senn SJ. Modelling in drug development. In: Christie M, Cliffe A, Dawid AP, Senn SJ, eds. Simplicity Complexity and Modelling. Chichester: Wiley; 2011:35-49.

    However, the really important work on this Carl-Fredrik Burman’s PhD thesis

    Burman C-F. On Sequential Treatment Allocations in Clinical Trials [PhD]. Gothenburg: Department of Mathematics, Chalmers University of Technology; 1996.

    The expected loss compared to perfect balance is about one patient per covariate so not really important for moderate;y sized trials. this is one of the reasons why I hate minimisation: a pointless and overhyped procedure.

    I like to think that I am responible for Anthony Atkinson investigating the loss. He remarked that he thought it was a scandal that medical statisticians didn’t use his biased coin approach. I said to him it’s optimal but how much better is it than randomisation. Not much, it turns out.

    Reply
  5. Here’s my understanding of the excellent discussion. I hope someone will correct me, then I have a question.

    * adjustment for blocking/stratification variables is essential to have in the model
    * blocking during randomization has a low probability of helping
    * the way in which it would help is that it makes (in a linear model) the covariance matrix off diagonal terms that involve treatment be close to zero, which lowers the variance of the treatment effect estimate; i.e. it makes the stratification variable not collinear with treatment

    Now my question: suppose that there is only a 0.02 chance in an unstratified randomization for a noticeable imbalance in an important baseline covariate. Then you would be right, on the average, to ignore this problem. But in the 0.02 of trials in which it does occur, stratified randomization would have prevented damage to the treatment effect variance term. So why not stratify as an insurance policy?

    Reply
    • Thanks Frank, and all for all the great contributions.

      Frank wrote that blocking has a low probability of helping, but as far as I understand now this statement applies for large trials, but not small ones. As the trial gets smaller, the potential for stratified randomisation to be helpful is greater.

      As to the pertinent and good question of ‘why not use stratified randomisation?’: the only thing that comes to mind is that from a practical/logistical perspective it might be easier to use simple randomisation. Someone with actual experience of organising the randomisation (i.e. not me!) may be able to usefully comment on this point.

      Reply
  6. It might also be worth looking at Senn SJ, Anisimov VV, Fedorov VV. Comparisons of minimization and Atkinson’s algorithm. Statistics in Medicine. 2010;29(7-8):721-730. https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3763 This discusses the issue of efficiency in section 2. This shows that where more than one stratifying factor is involved, it is actually not necessary for the strata defined by the combinations to be balanced, provided that only main effects (and not interactions) of the covariates will be adjusted for. An example is the following design
    Treatment Sex Steroids used
    Experimental Female No
    Experimental Male Yes
    Control Female Yes
    Control Male No

    with n patients allocated to each of the four combination shown.This is balanced by sex and by steroid use but not by their combination.

    This paper also shows that the loss involved in randomisation compared to perfect balance (which is usually unatainable) is typically less than one patient per covariate. The comparsion is made assuming that whether or not stratisfication has taken place, the covariate is adjusted for. Unfortunately many of the comparisons of minimisation and randomisation have failed to use the same model for analysis, thus failing to separate design and analysis issues. this paper also shows that Atkinson’s algorithm is superior to minimisation.

    Reply
  7. This is really interesting thanks, surprising and thought provoking. I’ve started to do some regressions and I’m not sure stratified randomisation has any benefits on power, bias or type 1 error rates (maybe things depend on sample size, effect of strata covariates on outcome, their intercorrelation). Are power, bias or type 1 error rates the metrics we should base a design on ? So why do we continue to do stratified randomisation given it must have a cost (eg logistical). If there is no reason to do it will guidance from eg ICH or FDA/EMA or (UK) https://www.ct-toolkit.ac.uk/routemap/trial-planning-and-design/ be likely to change – thanks again for this interesting article and discussion

    Reply
  8. A very interesting discussion indeed. All this pertains, presumably, to a linear regression modelling framework. I have been investigating whether the same applies to Poisson regression and Negative Binomial regression. It turns out, based on simulation, that it does. If anyone is interested, try running the code below (3 covariates, one with six levels and the others dichotomous, sample size of 150 and 10k iterations). It would be an interesting exercise to establish this analytically.

    library(ggplot2);
    library(MASS);
    library(dplyr);

    mean.sd <- function(vec, digits = 4, digits.sd = digits – 1, cnt = FALSE) {
    mn % signif(digits = digits)
    std.dev % signif(digits = digits.sd);
    if (!cnt) str <- paste(mn, '(SD =', std.dev, ')')
    if (cnt) str <- paste(mn, '(SD =', std.dev, 'n =', length(vec), ')')
    str;
    }

    ## Set the sample size, and the covariate effect size and treatment effect sizes ####
    nn <- 150;
    n.sites <- 6;
    eff.sz <- 1.5;
    tx.names <- c('SIRO', 'PLAC')
    prior.SCC.cnt =10′, ‘<10')
    Use.5FU <- c('No', 'Yes')
    Site.name <- LETTERS[1:n.sites]

    risk.eff <- 0.5*c(-1,1)
    Use.5FU.eff <- 0.5*c(-1,1);
    Site.eff <- 2*(1:n.sites – n.sites/2 – 0.5)/2
    tx.eff <- eff.sz*c(-1,1);

    ## List of models to be simulated ####
    mdl.types <- c('gaussian', 'poisson', 'quasipoisson');
    disper <- 2;

    ## Perform simulation ####
    iters <- 10000;
    se.0.strat <- se.1.strat <- est.0.strat <- est.1.strat <- disp.0.strat <- disp.1.strat <- numeric(iters)
    se.0.simp <- se.1.simp <- est.0.simp <- est.1.simp <- disp.0.simp <- disp.1.simp <- numeric(iters)
    st <- proc.time()['elapsed'];
    for (mdl.type in mdl.types) {
    et <- proc.time()['elapsed'];
    cat('\nModelling', mdl.type, ', time elapsed', round(et-st), 'secs. \n')
    for (ii in 1:iters) {
    dta <- data.frame(ix = 1:nn);
    dta$risk <- prior.SCC.cnt[1+rbinom(nn,1, 0.5)];
    dta$Use.5FU <- Use.5FU[1+rbinom(nn,1, 0.5)];
    dta$Site <- Site.name[ceiling(runif(n = nn)*n.sites)];
    dta$stratum <- paste(dta$Site, dta$Use.5FU, dta$risk)
    ## Assign treatments using simple randomisation, and then by stratified randomisation
    dta$tx.simp <- rep(tx.names, length.out=nn);
    dta % group_by(stratum) %>% mutate(tx.strat = rep(sample(tx.names), length.out=length(stratum)));

    log.lamb.simp <- tx.eff[match(dta$tx.simp, tx.names)] +
    risk.eff[match(dta$risk, prior.SCC.cnt)] +
    Use.5FU.eff[match(dta$Use.5FU, Use.5FU)] +
    Site.eff[match(dta$Site, Site.name)];
    dta$lambd.simp <- exp(log.lamb.simp)

    log.lamb.strat <- tx.eff[match(dta$tx.strat, tx.names)] +
    risk.eff[match(dta$risk, prior.SCC.cnt)] +
    Use.5FU.eff[match(dta$Use.5FU, Use.5FU)] +
    Site.eff[match(dta$Site, Site.name)];
    dta$lambd.strat <- exp(log.lamb.strat)
    if (mdl.type=='gaussian') {
    dta$SSCs.simp <- rnorm(n = nn, mean = log(dta$lambd.simp), sd = 2)
    dta$SSCs.strat <- rnorm(n = nn, mean = log(dta$lambd.strat), sd = 2)
    }
    if (mdl.type=='poisson') {
    dta$SSCs.simp <- rpois(n = nn, lambda = dta$lambd.simp)
    dta$SSCs.strat <- rpois(n = nn, lambda = dta$lambd.strat)
    }
    if (mdl.type=='quasipoisson') {
    dta$SSCs.simp <- rnbinom(n = nn, mu = dta$lambd.simp, size = dta$lambd.simp/(disper-1))
    dta$SSCs.strat <- rnbinom(n = nn, mu = dta$lambd.strat, size = dta$lambd.strat/(disper-1))
    }

    m.pois.0.simp <- glm(formula = SSCs.simp ~ tx.simp, family = mdl.type, data = dta)
    m.pois.1.simp <- glm(formula = SSCs.simp ~ tx.simp + risk + Use.5FU + Site, family = mdl.type, data = dta)
    m.pois.0.strat <- glm(formula = SSCs.strat ~ tx.strat, family = mdl.type, data = dta)
    m.pois.1.strat <- glm(formula = SSCs.strat ~ tx.strat + risk + Use.5FU + Site, family = mdl.type, data = dta)

    est.0.simp[ii] <- coef(m.pois.0.simp)['tx.simpSIRO'];
    est.1.simp[ii] <- coef(m.pois.1.simp)['tx.simpSIRO'];

    est.0.strat[ii] <- coef(m.pois.0.strat)['tx.stratSIRO'];
    est.1.strat[ii] <- coef(m.pois.1.strat)['tx.stratSIRO'];

    ## Extract the standard errors for the treatment effect
    se.0.simp[ii] <- summary(m.pois.0.simp)$coef['tx.simpSIRO', 'Std. Error']
    se.1.simp[ii] <- summary(m.pois.1.simp)$coef['tx.simpSIRO', 'Std. Error']

    se.0.strat[ii] <- summary(m.pois.0.strat)$coef['tx.stratSIRO', 'Std. Error']
    se.1.strat[ii] <- summary(m.pois.1.strat)$coef['tx.stratSIRO', 'Std. Error']

    if (mdl.type=='quasipoisson') {
    disp.0.simp[ii] <- summary(m.pois.0.simp)$dispersion;
    disp.1.simp[ii] <- summary(m.pois.1.simp)$dispersion;
    disp.0.strat[ii] <- summary(m.pois.0.strat)$dispersion;
    disp.1.strat[ii] <- summary(m.pois.1.strat)$dispersion;
    }

    if (ii%%5000==0) {
    et <- proc.time()['elapsed'];
    cat(ii, round(et-st), 'secs. \n')
    }
    }

    ## Use summary statistics to check that the stratification is working
    ## Stratum specific means and SD's should be approximately the same under both simple and stratified randomisation
    ## But stratum specific counts should be balanced by arm under stratified randomisation only
    smry.strat % group_by(stratum, tx.strat) %>% summarise(mn.strat = mean.sd(SSCs.strat, cnt = T))
    smry.simp % group_by(stratum, tx.simp) %>% summarise(mn.simp = mean.sd(SSCs.simp, cnt = T))
    smry <- merge(smry.simp, smry.strat, by.x = c('stratum', 'tx.simp'), by.y = c('stratum', 'tx.strat'))
    assign(paste0('smry.', mdl.type), smry)

    ## Report results ####
    multi.strat.sd <- mean.sd(est.1.strat, digits.sd = 4)
    uni.strat.sd <- mean.sd(est.0.strat, digits.sd = 4)
    multi.simp.sd <- mean.sd(est.1.simp, digits.sd = 4)
    uni.simp.sd <- mean.sd(est.0.simp, digits.sd = 4)

    cat('\n\n**', paste(mdl.type, 'model treatment effect'), '**')
    cat('\nSimple randomisation\n')
    cat('Univariate model point estimate:', uni.simp.sd, '\n')
    cat('Multivariate model point estimate:', multi.simp.sd, '\n')

    cat('\nStratified randomisation\n')
    cat('Univariate model point estimate:', uni.strat.sd, '\n')
    cat('Multivariate model point estimate:', multi.strat.sd, '\n')

    uni.simp.mdl.sd % mean
    multi.simp.mdl.sd % mean
    uni.strat.mdl.sd % mean
    multi.strat.mdl.sd % mean
    }

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.