The reference-based approach to imputing missing data has become popular in clinical trials, as I’ve blogged about previously. In the standard approach, the multiple imputations are generated as draws from the posterior distribution under a Bayesian model. With a continuous outcome, each of the imputed datasets is analysed using a linear regression model for the outcome (typically measured at the final time point), with treatment group and some baseline variables as covariates.
In a new pre-print available on arXiv, in work by Marcel Wolbers and colleagues at Roche, we propose an alternative approach for reference-based imputation for continuous outcomes. This approach results in a treatment effect point estimate and (frequentist) standard error without any Monte-Carlo error.
In this approach, we adopt the maximum likelihood approach to multiple imputation (MI), where we impute conditional on the maximum likelihood estimates of the imputation model parameters, rather than conditional on a draw from the posterior distribution of the imputation model parameters. This means we avoid using MCMC.
Next, rather than impute from the conditional distribution of the missing values given the observed, under the chosen reference-based assumption, we impute non-stochastically by the corresponding conditional mean. We then fit the linear regression outcome model to this single imputed dataset. In general imputing missing values by their conditional mean does not lead to consistent parameter estimates, but because in this setting it is our outcome variable being imputed, and because of the particular form of the linear regression ordinary least squares equations, it turns out that imputing the conditional mean does give consistent estimates. In fact, the resulting treatment effect estimate is essentially equal to the estimate you would get if you used the standard Bayesian approach with multiple imputations. This is empirically demonstrated in the simulations in the paper.
Compared to the standard MI approach, the conditional mean approach gives a single point estimate without any Monte-Carlo error. This is attractive because one does not need to worry about how many multiple imputations are needed to get the Monte-Carlo error sufficiently low. Also, the fact that the results are non-stochastic is attractive from a reproducibility perspective, particularly in the context of trials being conducted for regulatory approval of new drugs.
What about the standard error, confidence interval, and p-value? While analytical approaches are possible, they are complicated and setup specific. One approach is to use bootstrapping, but this has the drawback of also containing Monte-Carlo error in the standard error estimate, and again one has to figure out how many bootstraps to use. Thus in the paper we propose as an alternative the use of the jackknife. This means that we re-calculate our conditional mean imputation based point estimate of the treatment effect on the n datasets formed by excluding one patient at a time, and then use these to calculate the jackknife estimate of variance. Unlike the bootstrap (at least the feasible version of bootstrap where we take a finite number of bootstrap samples rather than all possible ones), the jackknife gives a non-stochastic standard error, from which we can calculate a confidence interval and p-value, assuming normality of the estimator.
Postscript – 19th May 2022. This work is now available open access in the journal Pharmaceutical Statistics.