Multiple imputation and its application – 2nd edition published

I am delighted to write this blog post announcing the publication of the second edition of the book ‘Multiple Imputation and its Application’, published by Wiley, and which I am a co-author along with colleagues James Carpenter, Tim Morris, Angela Wood, Matteo Quartagno, and Mike Kenward.

Key additions in the second edition are:

  • in depth discussion of congeniality and compatibility, and the practical implications of the theory for these for data analysts
  • an updated chapter on performing imputation with derived variables, such as interactions, non-linear effects, sum scores, splines
  • expanded chapter on MI with survival data, including imputing missing covariates in Cox models and MI for case-cohort and nested case-control studies
  • new chapters on multiple imputation for / in the context of:
    • prognostic models
    • measurement error and misclassification
    • causal inference
    • using MI in practice
  • practical and theoretical exercises in each chapter

We hope it will be useful for those handling missing data by multiple imputation in their analyses, particularly in regards to thinking about how to use it in a way which accommodates the various complexities that are often present in statistical analyses.

The book should now be available “in all good bookshops”, as they say. You can find it at Amazon (please note I may receive a commission if you subsequently purchase from Amazon after clicking this link).

Multiple imputation for missing baseline covariates in discrete time survival analysis

A while ago I got involved in a project led by Anna-Carolina Haensch and Bernd WeiƟ investigating multiple imputation methods for baseline covariates in discrete time survival analysis. The work has recently been published open access in the journal Sociological Methods & Research. The paper investigates a variety of different multiple imputation approaches. My main contribution was the extension of the substantive model compatible fully conditional specification (smcfcs) approach for multiple imputation to the discrete time survival model setting, and extending the functionality of the smcfcs package in R to incorporate this. In this short post I’ll give a quick demonstration of this functionality.

Read more

Perfect prediction handling in smcfcs for R

One of the things users have often asked me about the substantive model compatible fully conditional specification multiple imputation approach is the problem of perfect prediction. This problem arises when imputing a binary (or more generally a categorical variable) and there is a binary (or categorical) predictor, if among one or more levels of the predictor, the outcome is always 0 or always 1. Typically a logistic regression model is specified for the binary variable being imputed, and in the case of perfect prediction, the MLE for one or more parameters (on the log odds scale) is infinite. As described by White, Royston and Daniel (2010), this leads to problems in the imputations. In particular, to make the imputation process proper, a draw from the multivariate normal is used to draw new parameters of the logistic regression imputation model. The perfect prediction data configuration leads to standard errors that are essentially infinite, but in practice on the computer will be very very large. These huge standard errors lead to posterior draws (or what are used in place of posterior draws) which fluctuate from being very large and negative to very large and positive, when in reality they ought to be only large in one direction (see Section 4 of White, Royston and Daniel (2010)).

Read more