Multiple imputation (MI) is a popular approach to handling missing data. In the final part of MI, inferences for parameter estimates are made based on simple rules developed by Rubin. These rules rely on the analyst having a calculable standard error for their parameter estimate for each imputed dataset. This is fine for standard analyses, e.g. regression models fitted by maximum likelihood, where standard errors based on asymptotic theory are easily calculated. However, for many analyses analytic standard errors are not available, or are prohibitive to find by analytical methods. For such methods, if there were no missing data, an attractive approach for finding standard errors and confidence intervals is the method of bootstrapping. However, if one is using MI to handle missing data, and would ordinarily use bootstrapping to find standard errors / confidence intervals, how should these be combined?
This week I've released a new version of the smcfcs package for R on CRAN. SMC-FCS performs multiple imputation for missing covariates in regression models, using an adaption of the chained equations / fully conditional specification approach to imputation, which we called Substantive Model Compatible Fully Conditional Specification MI.
The new version of smcfcs now supports Poisson regression outcome / substantive models, which are often used for count outcomes. Future additions will add support for negative binomial regression models, which are often used to model over dispersed count outcomes, and also support for offsets, which are often needed when fitting count regression models.
The missing at random (MAR) assumption plays an extremely important role in the context of analysing datasets subject to missing data. Its importance lies primarily in the fact that if we are willing to assume data are MAR, we can identify (estimate) target parameters. There are a variety of methods for handling data which are assumed to be MAR. One approach is estimation of a model for the variables of interest using the method of maximum likelihood. In the context of randomised trials, primary analyses are sometimes based on methods which are valid under MAR, such linear mixed models (MMRM). A key concern however is whether the MAR assumption is plausibly valid in any given situation.
For any users of my R package smcfcs, I've just released a new version (1.1.1), which along with a few small changes, includes a critical bug fix. The bug affected imputation of categorical (binary and categorical variables with more than two levels) when the substantive model is linear regression (other substantive model types were not affected). All users should update to the new version, which is available on CRAN.
A concern when analysing data with missing values is that the missing at random (MAR) assumption, upon which a number of methods rely, does not hold. When the missing at random assumption is in doubt, ideally we should perform sensitivity analyses, whereby we assess how sensitive our conclusions are to plausible deviations from MAR. One route to performing such a sensitivity analysis, which is convenient if one has already performed multiple imputation (assuming MAR), is the weighting method proposed by Carpenter et al in 2007. This involves applying a weighted version of Rubin's rules to the parameter estimates obtained from the MAR imputations, with the weight given to a particular imputation estimate depending on how plausible the imputations in that dataset are with an assumed missing not at random (MNAR) mechanism. The method is appealing because, computationally, it requires relatively little additional effort once MAR imputations have been generated.
In an important paper just published by Rezvan et al in BMC Medical Research Methodology, the performance of this weighting method has been explored through a series of simulation studies. In summary, they find that the method does not recover unbiased estimates, even when the number impuations used is large, when the correct (true) value of the MNAR sensitivity parameter is used. The paper explains in detail possible reasons for the failure of the method, but the summary conclusion is that the weighting method ought not to be used for performing MNAR sensitivity analyses after MAR multiple imputation.
What might one do as an alternative? One is to perform the selection modelling MNAR sensitivity analysis using software such as WinBUGS or JAGS, in which the substantive model and selection (missingness) model are jointly fitted, and one uses an informative prior for the sensitivity parameter. A further alternative, which like the weighting approach can (in certain situations) exploit multiple imputations generated assuming MAR, is the pattern mixture approach, whereby the MAR imputations are modified to reflect an assumed MNAR mechanism. The modified imputations can then be analysed and results combined using Rubin's rules in the usual way.
Today I gave a seminar at the Centre for Biostatistics, University of Manchester, as part of a three seminar afternoon on missing data. My talk described recent work on methods for handling missing covariates in competing risks analysis, with a focus on when complete case analysis is valid and on multiple imputation approaches. For the latter, our substantive model compatible adaptation of fully conditional specification now supports competing risks analysis, both in R and Stata (see here).
The slides of my talk are available here.
Update 13th May 2016: the corresponding paper is now available (open access) here.