Substantive model compatible imputation of covariates – smcfcs in R

I’m pleased to announce the release of an R package, smcfcs, which implements multiple imputation of missing covariates using substantive model compatible fully conditional specification. As described in a previous post, this is a modified version of the popular fully conditional specification, or chained equations, approach to multiple imputation (e.g. as implemented in the excellent MICE package).

smcfcs is an attractive approach when the outcome or substantive model includes interactions or non-linear covariate effects, or is itself a non-linear model, such as Cox’s proportional hazards model. In these case, it can be difficult, or sometimes impossible, to directly specify an imputation model for partially observed covariates that is compatible with the outcome/substantive model. Such incompatibility can lead to biased estimates, due to mis-specification of the imputation model. smcfcs resolves this potential problem by ensuring that each partially observed covariate is imputed from an imputation model which is compatible with a user specified outcome/substantive model.

smcfcs is available on CRAN in R. It supports linear and logistic regression outcome models, as well as Cox proportional hazards models for censored time to event outcomes. Competing risks outcomes can also be accommodated through specification of Cox models for each cause specific hazard function. A Stata version is all available, and can be installed from within Stata from the SSC archive using: ssc install smcfcs

Including the outcome in imputation models of covariates

Multiple imputation has become a popular approach for handling missing data (see www.missingdata.org.uk). Suppose that we have an outcome (dependent variable in our model of interest) Y, and a covariate X. Suppose further that X contains some missing values, and that we are happy to assume that these satisfy the missing at random assumption. Then we might consider using multiple imputation to impute the missing values in X. A natural question that then follows is whether, in the imputation model for X, the variable Y should be included as a covariate? Particularly when Y is a variable measured later in time than X, our intuition may lead us to think that it is inappropriate to use the future information contain in Y when imputing in X. This however, is not the case.

Read more

Conditional randomization, standardization, and inverse probability weighting

In a previous post, I began following the developments in Miguel HernĂ¡n and James Robins’ soon to be published book, Causal Inference. There I gave an overview of the first topics they cover, namely potential outcomes, causal effects, and randomization. In this post I’ll continue, with some personal notes on the remaining parts of Chapter 2 of their book, on conditional randomization, standardization, and inverse probability weighting.

Read more