Robustness of ANCOVA in randomised trials with unequal randomisation

In my previous post I wrote about a new paper in Biometrics which shows that when ANCOVA is used to analyse a randomised trial with adjustment for baseline covariates, as well as the treatment effect estimator being consistent, the usual model based standard error (SE) is also valid, irrespective of whether the regression model is correctly specified. As I wrote, these results were proved assuming that the trial used simple randomisation to the two groups, with equal probability of randomisation to the two.

In a pre-print available on arXiv, I extend this paper’s results to consider the case where the randomisation probabilities are not equal. Although 1:1 randomisation is by far the most common approach used I think, unequal randomisation is not that uncommon. In this situation, the point estimator for the treatment effect is still consistent – this isn’t affected by the unequal randomisation.

The analyses in the paper show that the model based SE is no longer generally consistent if the outcome model is misspecified when the randomisation is not 1:1. It is valid if the true regression coefficients of the outcome on the covariates are the same in the two treatment groups and the variance of the errors in the two groups are equal. But otherwise in general it is not. So for example even if the true regression coefficients are equal in the two groups, if the error variances are not, the model based SE is not valid. Alternatively, if the true regression coefficients differ in the two groups (i.e. interactions between treatment and some baseline covariates), again the model based SE would not in general be expected to be valid.

The impact of such invalidity in the SEs is that the type I error will not generally be controlled at the desired level, and confidence intervals will not have the correct coverage, even for large sample sizes. The results show that depending on the configuration the SEs can be biased upwards or downwards (see the pre-print for details).

These results mean that in trials with simple randomisation and where the randomisation is not 1:1, if one is concerned about the ANCOVA model being misspecified, the model based SE shouldn’t be used. Instead, robust sandwich SEs, which are widely available in statistical packages, are recommended. These provided asymptotically valid variance estimation under essentially arbitrary model misspecification.

December 2019 – this work has now been published in Biometrics.

Robustness of linear mixed models

Linear mixed models form an extremely flexible class of models for modelling continuous outcomes where data are collected longitudinally, are clustered, or more generally have some sort of dependency structure between observations. They involve modelling outcomes using a combination of so called fixed effects and random effects. Random effects allow for the possibility that one or more covariates have effects that vary from unit (cluster, subject) to unit. In the context of modelling longitudinal repeated measures data, popular linear mixed models include the random-intercepts and random-slopes models, which respectively allow each unit to have their own intercept or (intercept and) slope.

As implemented in statistical packages, linear mixed models assume that we have modelled the dependency structure correctly, and that both the random effects and within-unit residual errors follow normal distributions, and that these have constant variance. While it is possible to some extent to check these assumptions through various diagnostics, a natural concern is that if one or more assumptions do not hold, our inferences may be invalid. Fortunately it turns out that linear mixed models are robust to violations of some of their assumptions.

Read more

The t-test and robustness to non-normality

The t-test is one of the most commonly used tests in statistics. The two-sample t-test allows us to test the null hypothesis that the population means of two groups are equal, based on samples from each of the two groups. In its simplest form, it assumes that in the population, the variable/quantity of interest X follows a normal distribution N(\mu_{1},\sigma^{2}) in the first group and isĀ N(\mu_{2},\sigma^{2}) in the second group. That is, the variance is assumed to be the same in both groups, and the variable is normally distributed around the group mean. The null hypothesis is then that \mu_{1}=\mu_{2}.

Read more