Does a Bernoulli/binomial model really assume everyone has the same probability p?

When you estimate a proportion and want to calculate a standard error for the estimate, you would normally do so based on assuming that the number of ‘successes’ in the sample is a draw from a binomial distribution, which counts the number of successes in a series of n independent Bernoulli 0/1 draws, where each draw has a probability p of ‘success’. Does the model rely or assume that for each of these binary observations the success probability is the same? In the third paragraph of this blog post Frank Harrell (seems to) argue that it does. In this post I’ll delve into this a bit further, using the same numerical example Frank gives.

Suppose we have a random sample of n individuals on whom we observe a binary outcome indicating presence or absence of disease. Suppose that in a sample of n=100, 40 have the disease, and so our estimate of the proportion of disease in the population (which I will denote p) from the sample was drawn is \hat{p}=40/100=0.4.

Read more

Multiple imputation with flexible parametric survival models

Following a recent request from someone, I’ve extended the functionality of my R package smcfcs, which performs multiple imputation of missing covariates, compatible with a user-specified substantive or outcome. The package can now impute compatibly with a flexible parametric Royston-Parmar type model. In this post I’ll briefly highlight some of the potential uses of this new functionality.

Read more

Multiple imputation for missing covariates in the Fine & Gray model for competing risks

Competing risks and the Fine & Gray model

In the setting of competing risks, one approach involves modelling the effects of covariates on the so-called cause specific hazard functions for each of the causes. An alternative is to model covariate effects on the cumulative incidence of one or more of the causes. The effects of covariates on the cumulative incidence of one cause (failure type), say cause 1, depends on the covariates’ effects on all the cause specific hazard functions. This can be intuitively seen by the fact that one way a covariate can increase the chances an individual fails from cause 1 is by reducing the hazard for failure from the other causes, meaning they have more opportunity to fail from cause 1.

For modelling covariate effects on cumulative incidence, the most popular regression approach is the Fine & Gray model, which assumes a proportional hazards model for the subdistribution hazard for the cause of interest (again, I’ll call this cause/failure type 1). In the (unusual) situation where no event times are censored, this model can be fitted using standard software for fitting Cox proportional hazards model, where individuals who fail from causes other than 1 are kept in the risk set at all observed event times. When there is censoring, but the censoring times are known for all individuals, even those who were observed to have an event (termed censoring complete by Fine & Gray), such as would be the case when censoring is administrative, individuals who fail from causes other than 1 remain in the risk set until their potential censoring time. Often however we have censoring of other forms such that the censoring times are not known for individuals who are observed to fail (experience an event).

Missing covariates in the Fine & Gray model

In practice one or more of the covariates we wish to include in the model may have missing values. Suppose one wishes to use multiple imputation to impute these missing covariate values. How should the imputation be performed, given that a Fine & Gray model is of interest? This question is addressed in a paper by Edouard Bonneville and colleagues, a pre-print of which is now available on arXiv. I shall not go into the details of the paper here, except to give a brief overview of one of the main contributions. This is to exploit the fact that in the setting of time-to-event data with one type of failure, imputation methods are well developed (White and Royston 2009, Bartlett et al 2015), and that as described above, when data are censoring complete, the Fine & Gray model can be fitted using a Cox model with a modified risk set definition. As such, Edouard’s paper proposes in the usual situation where censoring times are not available for all to 1) impute the missing censoring times using Kaplan-Meier based imputation, 2) apply imputation methods for missing covariates developed for the single failure type Cox model setting. The paper proposes approaches for step 2) based on MICE imputation and also the SMC-FCS approach we developed in earlier work (Bartlett et al 2015).

smcfcs for the Fine & Gray model in R

The proposal based on the SMC-FCS approach is now available in the smcfcs R package, thanks to Edouard. The following code, taken from the example in the new smcfcs.finegray function, illustrates the relatively simple workflow using this extension:

library(survival)
library(kmi)
library(smcfcs)

imps <- smcfcs.finegray(
  originaldata = ex_finegray,
  smformula = "Surv(times, d) ~ x1 + x2",
  method = c("", "", "logreg", "norm"),
  cause = 1,
  kmi_args = list("formula" = ~ 1)
)

library(mitools)
impobj <- imputationList(imps$impDatasets)
# Important: use Surv(newtimes, newevent) ~ ... when pooling
# (respectively: subdistribution time and indicator for cause of interest)
models <- with(impobj, coxph(Surv(newtimes, newevent) ~ x1 + x2))
summary(MIcombine(models))

If you’re interested to learn more, please take a look at Edouard’s paper on arXiv.