The FDA recently published revised guidance on statistical methods for adjusting for baseline covariates in trials. Overall I like the guidance and think it will prove useful. In this post I’ll give a few thoughts on aspects of the revised guidance, organised according to the sections of the guidance document.

## General considerations

**Robust standard errors**

Robust sandwich standard errors are advocated rather than ‘nominal standard errors’ (model based ones), on the basis that model based SEs may be biased when the outcome regression is misspecified. Actually, for linear models, the model based SE is valid even under misspecification (Wang et al 2000) when randomisation is 1:1. When it is not 1:1 however, sandwich SEs are indeed needed (Bartlett 2020) even in the linear model case.

**Stratified randomisation** **and standard errors**

Often randomisation is stratified on some of the prognostic factors. If one correctly specifies the outcome regression and adjusts for the variables used in the stratified randomisation, valid inferences are obtained. If one however views the outcome model as a possibly misspecified working model, then how can we obtain valid standard errors and tests if one does not necessarily assumed the working model is correctly specified?

In the linear model case, the FDA guidance recommends using the methods of Bugni et al 2018. They show (as was already known earlier) that if one uses stratified randomisation but performs a t-test comparing groups ignoring the stratification factors, inferences are conservative. Bugni et al describe a modification to this standard t-test which ensures the type 1 error control is exactly the desired level (i.e. removing the conservativeness).

Bugni et al also consider ‘t-test with strata fixed effects’, which is a linear model with treatment and indicators for each stratum as covariates, using the usual heteroscedastic robust SEs. It’s important to note the model here is not a model where the baseline variables used to perform the stratified randomisation are entered themselves as covariates, but rather indicators for membership of strata defined by these variables. Thus if one stratifies randomisation on two binary covariates, there are four strata, and this method includes indicators for each of these as covariates. They show that the test of treatment here has correct type 1 error if randomisation is 1:1. In fact they also show that the usual model based SE provides exact (asymptotically) type 1 error control, provided randomisation is 1:1.

## Linear Models

**Conditional vs unconditional effects**

For linear models one of the things stated is that:

Covariate adjustment through a linear model (without treatment by covariate interactions) also estimates a conditional treatment effect, which is a treatment effect assumed to be approximately constant across subgroups defined by baseline covariates in the model.

This is saying that for linear models which don’t include treatment by covariate interactions, the treatment effect estimate has both an interpretation as a marginal treatment effect (how does the population mean outcome change if you switch from treatment A to B) and as a conditional effect (how does the mean outcome change in subpopulations defined by levels of the covariates adjusted for). Of course for the latter interpretation to be correct, as the quoted text says, it must be the case that in truth these conditional/subpopulation effects are identical across these subpopulations. In reality there is no reason why this will necessarily be true, certainly not exactly.

**Including interactions**

The guidance notes that

The linear model may include treatment by covariate interaction terms. However, when using this approach, the primary analysis should still be based on an estimate from the model of the average treatment effect.

I don’t think it is immediately obvious how a model which includes interactions between treatment and baseline covariates can be used to obtain an estimate of the unconditional/marginal effect. To do so, one can use equation 5 of Tsiatis et al 2008, where h0(Xi) and h1(Xi) are the model predicted outcome means under control and active treatment from the fitted outcome model

## Nonlinear Models

**Collapsibility**

The guidance helpfully explains the issue of non-collapsibility, which affects odds ratios and hazard ratios. The simple table demonstrating non-collapsibility is particularly appealing.

**Conditional effects estimated by outcome regression**

The guidance notes that regression (e.g. logistic) models adjusting for covariates estimated conditional effects. The difficulty, as per my comments above in relation to linear models, is that there is no reason why say the conditional odds ratio should be the same across subpopulations defined by the baseline covariates. The guidance states:

Sponsors should discuss with the relevant review divisions specific proposals in a protocol or statistical analysis plan containing nonlinear regression to estimate conditional treatment effects for the primary analysis. When estimating a conditional treatment effect through nonlinear regression, the model will generally not be exactly correct, and results can be difficult to interpret if the model is misspecified and treatment effects substantially differ across subgroups. Interpretability increases with the quality of model specification.

This acknowledges the issue, but it is not clear to me how one can operationalise this given requirements for prespecification of a primary analysis model. At the very least, it indicates I think one should perform diagnostics to detect whether the conditional effects are constant, but what would one do if these reveal heterogeneity? Moreover, the power to detect model misspecification could be low, such that one may conclude there is no evidence against a null hypothesis of common conditional effects even in cases where there may be moderate heterogeneity of these effects.

**Covariate adjusted estimation of unconditional effects**

The guidance indicates trials could use methods which exploit baseline covariates for improved power but still target the marginal or unconditional treatment effect (e.g. Moore and van der Laan 2009). The guidance even helpfully gives a recipe for how to construct such estimators, and by doing so demonstrates they are pretty straightforward to implement. One thing the guidance does not mention is how to obtain valid inferences using such methods when using stratified randomisation and when working outcome models may possibly misspecified. Fortunately, Wang et al 2019 recently extended the aforementioned results of Bugni et al 2018 to show this can be achieved.

**Marginal effects** **and transporting**

One criticism of marginal effects, or at least their estimation of them from randomised trials, is that their estimation using standard methods implicitly relies on an assumption that the patients in the trial are a representative sample from the population of interest. As has been helpfully pointed out to me in the past (e.g. by Stephen Senn and Frank Harrell), this is never really the case. Given this, and the fact that the magnitude of marginal effects can change if you were to modify the population definition, this raises some doubts about the interpretation of estimates of marginal effects from randomised trials. In this regard, there has been quite a lot of work (which so far I am only somewhat familiar with) which looks at how to combine data from a trial with external information about the target population of interest, in order to estimate the effect in the target population. Recent papers on these developments include Ackerman et al 2020 and Dahabreh et al 2020.

Great post! Important notes on (in)validity of the model, estimation of a treatment effect, and interpretation.

Some lessons for me on covariate adjusted treatment effect estimation:

– For 1:1 randomization, there is nothing wrong with the model based SE and p-value

– The conditional effect is what we should be interested in IF decisions are made for the individual patient

– Illustration with a simple table for non-collapsability is a good idea https://pubmed.ncbi.nlm.nih.gov/10783203/

– Marginal effects have poor transportability, so actually make an extra assumption, i.e. that the RCT sample is a representative sample from the population where the treatment will be applied

– Interaction analysis in a RCT is like subgroup effect modeling, where we should remain skeptical, even if estimating marginal effects after conditioning including interactions. I agree that it is unclear how “interpretability increases with a more complex model” to estimate a marginal treatment effect. Such an approach anyway suffers from the transportability problem of marginal estimates.

Thanks for your post!

In your post in 2015, you talked about the importance of using the outcome when imputing the missing covariates

http://thestatsgeek.com/2015/05/07/including-the-outcome-in-imputation-models-of-covariates/

In the proposed FDA guidance, it says

“Covariate adjustment is generally robust to the handling of subjects with missing baseline covariates. Missing baseline covariate values can be singly or multiply imputed, or missingness indicators (Groenwold et al. 2012) can be added to the model used for covariate adjustment. Sponsors should not perform imputation separately for different treatment groups, and sponsors should ensure that imputed baseline values are not dependent on any post-baseline variables, including the outcome.”

How do you square what is in the guidance with what is in your 2015 post?

All the best,

Dan

Thanks Dan. I must admit I somehow didn’t pay much attention to this aspect of the FDA guidance! In general if you are using MI to impute missing covariates you must condition on the outcome variable so that in the imputed datasets the covariate has the correct joint distribution with the other variables, including outcome. However, because baseline covariate adjustment serves only to improve precision in RCTs (at least if outcomes are complete), certain methods ‘work’ which don’t work more generally. Hence some of the recommendations in this paper White and Thompson 2005 (https://doi.org/10.1002/sim.1981), the abstract of which closely aligns with the FDA guidance. However, Section 9.2 of the paper (correctly) says that if you are going to use MI to impute missing baselines, you must adjust for treatment and outcome in the imputation model.

In practice it is rare in my experience to have missing baseline values but the outcome complete. When you have missingness in both, MI in certain situations can be a neat solution for handling the missingness in both the covariates and outcomes.

Best wishes,

Jonathan