I’ve recently had the opportunity to spend a little time looking at an interesting approach for improving the efficiency of estimated treatment effects in clinical trials which exploits historical data. In this blog post I’ll give a few thoughts on the results in ‘Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score’, by Schuler et al 2022. The paper was published in the International Journal of Biostatistics, and an arXiv pre-print is available here.

## The idea

The idea is to use historical data to build a prognostic model for outcome (given covariates). In the clinical trial, we then adjust for each patient’s predicted outcome based on the prognostic model, plus the covariates themselves. The approach may gain efficiency relative to just adjusting for the covariates individually (modelling these linearly in the trial) if in truth the covariates have non-linear effects on outcomes which are modelled in the historical data. As the authors point out, one could of course model the non-linear effects in the model fitted to the trial data. But accurately modelling such effects may be more feasible when the historical data is large, whereas the trial dataset may be relatively small.

## Simulation results and case study

The authors present simulation results in a range of scenarios, demonstrating some dramatic reductions in the variance of the estimated treatment effect when adjusting for the patient’s prognostic score (constructed using simulated historical data), relative to both an analysis that does not adjust for covariates and one that adjusts linearly for the covariates (plus their interactions with treatment). In the simulation setup, a set of 10 independent uniformly distributed covariates had linear and non-linear effects on the outcome mean.

To examine the results myself a bit further, I coded up the basic elements of the simulation study (R script here). The baseline simulation scenario is described to have ‘moderate outcome non-linearity’, i.e. moderate non-linear effects of the covariates on the outcome. To get a sense of how much omitting the non-linear effects reduces the amount of explained variation in the outcome, I simulated a large dataset with n=10,000 individuals. I first fitted the linear model with treatment and the covariates included linearly. This gives an R squared of about 35% (one could analytical evaluate this, but I am being lazy). Using instead the true conditional mean as a covariate (which includes the linear and non-linear effects), the R squared increases to about **90%**. Adding the true conditional mean corresponds to using a patient’s true prognostic score as covariate, termed the oracle estimator in the paper. The non-linear effects are in the setup considered having a huge impact on the amount of explained variation in the outcome, and I think explains the quite dramatic reductions in variance shown in the simulation results. The actual reductions in variance achieved through adjustment for the *estimated *prognostic score are necessarily not quite as good as when using the true prognostic score, because of the effects of estimation of the prognostic model using the historical data (for which the authors used random forest – more on this below).

In practice, even if some covariates have somewhat non-linear effects on outcome, I am doubtful one would often see such huge improvements in R squared, and hence such large reductions in variance of the estimated treatment effect may be somewhat exceptional. Indeed, the authors’ case study using data from Alzheimer’s disease studies seems to bear this out: the reduction in the estimated standard errors through adding the prognostic score as a covariate (to a model already including the covariates linearly) were pretty small.

## Size of historical training data and machine learning

In the simulation study, the historical training data were of size 10,000, while the trial sample size was 500. Since random forest was used to fit the prognostic model to the simulated historical data, and flexible machine learning techniques generally require large amounts of data to accurate capture the dependence of the outcome on covariates, one wonders how performance might be impacted if the historical data were not 20 times the size of the trial data, as was the case in the simulation study. For example, if one had a previously conducted trial dataset of the same size as the trial to be run, with hundreds not thousands of patients, random forest would presumably do less well at accurately capturing the non-linear effects.

## Conclusions

The idea of developing a prognostic covariate based on historical data is interesting and may be prove useful in some settings. However, I am not yet personally convinced that it will often materially improve precision over and above using a model that adjusts linearly for a small number of key covariates chosen because they are known to be strongly prognostic for outcome (which is the standard practice). Re-analysing more previously conducted clinical trials (and datasets serving as historical data) could be helpful to investigate how much efficiency might be gained. Moreover, recent work has shown that machine learning type approaches can be used to flexibly adjust for covariates using only the in-trial data – it would be interesting to see how this performs relative to adjusting for a prognostic score estimated using historical data.

Thanks for drawing attention to this paper. David Cox suggested adjusting for a prognostic score many years ago (1) and it is referred to in a recent paper with Sandra Siegfried and Torsten Hothorn (2), the appendix to which discusses the three consequences (in the case of a linear model) of adjusting: 1) reduced expected mean square error 2) increased for fitting penalty due to loss of orthogonality 3) loss of second order efficiency (efficiency in estimating the residual variance) due to fewer degrees of freedom. The net effect of the three is that there is scope for increased efficiency by reducing the dimensions of the model through using one predictor rather than the many used to construct it; I agree with you that the effect is likely to be modest. It might be worth it if 1) there is a large robust historical data-set to construct a predictive score and 2) the current trial is small.

I am also reminded of John Tukey’s comment “We are much too prone to worry about the imperfections of adjustment” (3, p272) See also his earlier paper(4).

References

(1) Cox, D. R. (1982). Randomization and concomitant variables in the design of experiments. In G. Kallianpur, P. Krishnaiah, and J. Ghosh, (Eds.), Statistics and probability: Essays in honor of CR Rao, (pp. 197–202). North-Holland.

(2) Siegfried S, Senn S, Hothorn T. On the relevance of prognostic information for clinical trials: A theoretical quantification. Biom J. 2022. doi:10.1002/bimj.202100349

(3) Tukey JW. Tightening the clinical trial. Controlled clinical trials. 1993;14(4):266-85.

(4) Tukey JW. Use of Many Covariates in Clinical-Trials. International Statistical Review. 1991;59(2):123-37.