Someone recently asked me about using substantive model compatible imputation, as implemented in smcfcs in R, to impute missing covariates, followed by fitting Fine and Gray models for the cumulative incidence functions using the crr function in the cmprsk package.
The smcfcs packages in R and Stata have had functionality for imputing missing covariates in the competing risks setting for a number of years. This implementation is based on assuming Cox proportional hazards models for the cause-specific hazard functions, as described by Bartlett and Taylor 2016. At the end of this paper we noted that further research was needed when one is interested in fitting models (after imputation) for the cumulative incidence functions, as per the Fine and Gray model.
Imputing using smcfcs followed by crr
It is certainly the case that you can fit such models after using smcfcs to impute missing covariate values. I will illustrate here with the competing risks example data in the R version of smcfcs, which has two covariates, one binary, one normal, which are independent of each other. First I load the smcfcs package and then impute the example competing risks data using the smcfcs function:
library(smcfcs)
data(ex_compet)
set.seed(512312)
m <- 5
imps <- smcfcs(ex_compet, smtype="compet",
smformula=c("Surv(t,d==1)~x1+x2", "Surv(t,d==2)~x1+x2"),
method=c("","","logreg","norm"), m=m)
In the examples for smcfcs in R I use the excellent mitools package to fit the substantive/analysis model to the imputed datasets. As far as can see, the crr function in the cmprsk package does not have a data argument for passing a data frame to it. Rather there are separate arguments for passing vectors of times, failure indicators and a matrix of covariates. Thus I think a slightly more manual approach than is typically needed is require to fit the Fine and Gray model to each imputation and combine the estimates and standard errors using Rubin’s rules. The following does this, where I have chosen to fit a Fine and Gray model for the second cause of failure, assuming the sub-distribution hazard ratios for the two covariates do not change over time.
library(cmprsk)
ests <- list()
vars <- list()
for (i in 1:m) {
#fit crr model for first cause of failure
mod <- crr(imps$impDatasets[[i]]$t, imps$impDatasets[[i]]$d,
cov1=imps$impDatasets[[i]][,3:4],
failcode=2, cencode=0)
ests[[i]] <- mod$coef
vars[[i]] <- diag(mod$var)
}
library(mitools)
summary(MIcombine(ests,vars))
which results in:
Multiple imputation results:
MIcombine.default(ests, vars)
results se (lower upper) missInfo
x1 -1.4117197 0.37320106 -2.2729652 -0.5504743 76 %
x2 -0.3301149 0.09148286 -0.5261276 -0.1341023 59 %
These give us the MI estimated log sub-distribution hazard ratios for the effects of x1 and x2 for the cumulative incidence function corresponding to the second cause of failure. The results suggest each covariate increases the cumulative incidence of the second cause of failure, independent of the other covariate. In contrast, the data were simulated from Cox models for the cause specific hazards in which both x1 and x2 increased the first cause specific hazard while only x1 had an effect (to decrease) on the second cause specific hazard. The explanation of the difference in results is that increasing x1 increases the chance of failing from cause 1, which means individuals are then less likely to fail from cause 2.
Compatibility between models for cause specific hazards and cumulative incidence functions
As we have seen, it is perfectly possible to fit Fine and Gray models after using smcfcs to impute missing covariates. Is the approach we have taken here sensible? One concern is whether the Cox models for the cause specific hazards assumed in the imputation process by smcfcs are compatible with the Cox models assumed in the Fine and Gray model I fitted. As implemented here, I assumed all of the hazard ratios concerned were constant over time. Unfortunately it is generally the case that such models for the cause specific hazards and sub-distribution hazards cannot simultaneously be correct (i.e. they are not compatible). In Fine and Gray’s original paper they stated that in applications time by covariate interactions are anticipated, I think on the assumption that these occur if one assumes time constant hazard ratios for covariate effects for the cause-specific hazard functions. That the two models cannot both be correctly specified is explored in detail in a paper by Grambauer et al 2010.
What does this mean in practice for imputation of missing covariates in competing risks data? At least based on what I understand now, my best advice would be that one can impute using smcfcs, but being careful to check that the Cox model assumptions for the cause-specific hazard functions which are made are reasonable for the data. If one finds evidence that the cause-specific HRs of some covariates change over time, it may be possible to use extensions of smcfcs to handle such time-varying effects, as described by my colleagues Ruth Keogh and Tim Morris. As far as I know though (I haven’t spoken to them yet about this) their implementation and code are for the case of survival data rather than competing risks. Further work on the code would I think be thus required to merge their extension into the main smcfcs code which accommodates competing risks.
Having imputed covariates using smcfcs, one should then be careful to check whether sub-distribution hazard ratios are constant over time or not. I am not an expert on the use of Fine and Gray models, but my initial thought is that if the sub-distribution hazard ratio for a particular covariate of interest is not constant over time, perhaps it is preferable to use cause-specific hazard models for the different causes and then from these estimate marginal cumulative incidence function plots showing the effect of the covariate of interest. But there are perhaps other reasons why one would still prefer to stick with the Fine and Gray modelling approach that I’ve not understood.
Lastly, as per the final sentence of Bartlett and Taylor 2016, it would be interesting to investigate how to impute covariates in such a way which is compatible with a Fine and Gray model for one particular cause of interest.