Tim Morris and I were recently discussing the topic of multiple imputation (MI) of covariates when one wants to assume the covariate affects the outcome via a spline of some kind. We thought that the Substantive Model Compatible Full Conditional Specification (smcfcs) approach to MI should be able to handle this, provided we can specify the spline’s basis functions in a way that smcfcs function (available in R and Stata) can handle. In this post I’ll show that it can be done in R, at least with a simple cubic spline setup.
Missing data
How many imputations with mice? Assessing Monte-Carlo error after multiple imputation in R
When using multiple imputation to handle missing data, one must, if not immediately, but eventually, decide how many imputations to base inferences on. The validity of inferences does not rely on how many imputations are used, but the statistical efficiency of the inference can be increased by using more imputations. Moreover, we may want our results to be reproducible to a given precision, in the sense that if someone were to re-impute the same data using the same number of imputations but with a different random number seed, they would obtain the same estimates to the desired precision. For a great summary on considerations on how many imputations to use, see the corresponding section from Stef van Buuren’s book.
In this post I provide a small bit of R code which, given a pooled analysis after performing imputation using the mice package in R, calculates the so called Monte-Carlo standard error of the multiple imputation point estimates. Stata has really nice functionality for doing this built into mi estimate.
Conditional mean reference-based multiple imputation
The reference-based approach to imputing missing data has become popular in clinical trials, as I’ve blogged about previously. In the standard approach, the multiple imputations are generated as draws from the posterior distribution under a Bayesian model. With a continuous outcome, each of the imputed datasets is analysed using a linear regression model for the outcome (typically measured at the final time point), with treatment group and some baseline variables as covariates.
In a new pre-print available on arXiv, in work by Marcel Wolbers and colleagues at Roche, we propose an alternative approach for reference-based imputation for continuous outcomes. This approach results in a treatment effect point estimate and (frequentist) standard error without any Monte-Carlo error.