Last week I attended the International Society for Clinical Biostatistics' conference in Vigo, Spain. I spoke about work I've been doing recently on covariate adjusted mean estimation in randomised trials. A pre-print draft of the work is available at arXiv.

The work examines estimation of marginal means under each treatment. That is, the mean outcome in the population under assignment to each of the treatments under investigation. Previous research by Qu and Luo had advocated baseline covariate estimates of these means, and described a delta method variance estimator for them.

In the paper I consider under what assumptions such estimates are consistent for the true population values. When the outcome model used is a canonical GLM, it turns out that estimates are consistent even when the outcome model is misspecified, a rather amazing result, which was earlier proved by Rosenblum and van der Laan. For negative binomial regression, estimates of means/rates are consistent provided the conditional mean function is correctly specified.

Why use or report covariate adjusted means, in addition to the crude means by treatment group? First, for the same reason as we adjust for covariates in the analysis of trials - we typically obtain more precise estimates. Here, the covariate adjusted means can be viewed as adjusting the crude treatment group means for chance imbalance in the distribution of the baseline covariates between groups. Another appealing property is that for many model types, the difference or ratio of the adjusted group means exactly matches the outcome model estimated treatment effect.

In the paper I derive a variance estimator which allows for the covariates as random in repeated trials, unlike the paper by Qu and Luo linked to earlier, who treated the covariates as fixed. Simulations demonstrate that confidence intervals constructed assuming the covariates are fixed under cover when in truth they are random in repeated trials, although simulation evidence suggests the undercoverage may typically be minor under the baseline covariates are very strongly associated with outcome.

I also examine application of existing semiparametric theory for estimation of marginal parameters in randomised trials by Tsiatis and colleagues. These offer the potential for more precise estimates and which are also guaranteed to be consistent even under misspecification of the outcome model.

A further advantage of the covariate adjusted means are that, when some outcomes are missing, they are consistent under weaker assumptions (MAR conditional on treatment group and covariates) than the crude group means (MAR conditional on treatment group).

A complicating factor is use of randomisation schemes which make use of patient's covariates. Stratified randomisation is arguably the most common. I argue why, and demonstrate in a special case, that when randomisation is stratified on baseline covariates, there is no potential to gain precision over and above the crude group means. However, as per estimation of treatment effects, ignoring the stratification factors in the analysis results in conservative confidence intervals which over cover.

If anyone has any comments on the draft, I'd be pleased to receive them, either here as a comment on the post or via email.