Maximum likelihood multiple imputation

I just came across a very interesting draft paper on arXiv by Paul von Hippel on 'maximum likelihood multiple imputation'. von Hippel has made many important contributions to the multiple imputation (MI) literature, including the paper which advocated that one 'transform then impute' when one has interaction or non-linear terms in the substantive model of interest. The present paper on maximum likelihood multiple imputation is in its seventh draft on arXiv, the first being released back in 2012. I haven't read every detail of the paper, but it looks to me to be another thought provoking and potentially practice changing paper. This post will not attempt by any means to cover all of the important points made in the paper, but will just highlight a few.

Read moreMaximum likelihood multiple imputation

Imputing missing covariates in nested case-control and case cohort studies

I'm pleased to announce a new version (1.3.0) of the smcfcs package for multiple imputation of missing covariates. Thanks to Ruth Keogh at the London School of Hygiene & Tropical Medicine, this new version features two additional functions, smcfcs.casecohort and smcfcs.nestedcc. These allow for imputing of missing covariates in case cohort and nested case-control studies respectively. A paper describing the methodology is forthcoming.

The package is now on CRAN and so can be installed/updated in the usual way from R or RStudio.

There are of course various papers on the case cohort and nested case control study designs. For further reading, I'd recommend looking at Ruth's book, co-authored with David Cox, 'Case-Control Studies', which contains a chapter on each design.

Combining bootstrapping with multiple imputation

Multiple imputation (MI) is a popular approach to handling missing data. In the final part of MI, inferences for parameter estimates are made based on simple rules developed by Rubin. These rules rely on the analyst having a calculable standard error for their parameter estimate for each imputed dataset. This is fine for standard analyses, e.g. regression models fitted by maximum likelihood, where standard errors based on asymptotic theory are easily calculated. However, for many analyses analytic standard errors are not available, or are prohibitive to find by analytical methods. For such methods, if there were no missing data, an attractive approach for finding standard errors and confidence intervals is the method of bootstrapping. However, if one is using MI to handle missing data, and would ordinarily use bootstrapping to find standard errors / confidence intervals, how should these be combined?

Read moreCombining bootstrapping with multiple imputation

Multiple imputation for missing covariates in Poisson regression

This week I've released a new version of the smcfcs package for R on CRAN. SMC-FCS performs multiple imputation for missing covariates in regression models, using an adaption of the chained equations / fully conditional specification approach to imputation, which we called Substantive Model Compatible Fully Conditional Specification MI.

The new version of smcfcs now supports Poisson regression outcome / substantive models, which are often used for count outcomes. Future additions will add support for negative binomial regression models, which are often used to model over dispersed count outcomes, and also support for offsets, which are often needed when fitting count regression models.

On the missing at random assumption in longitudinal trials

The missing at random (MAR) assumption plays an extremely important role in the context of analysing datasets subject to missing data. Its importance lies primarily in the fact that if we are willing to assume data are MAR, we can identify (estimate) target parameters. There are a variety of methods for handling data which are assumed to be MAR. One approach is estimation of a model for the variables of interest using the method of maximum likelihood. In the context of randomised trials, primary analyses are sometimes based on methods which are valid under MAR, such linear mixed models (MMRM). A key concern however is whether the MAR assumption is plausibly valid in any given situation.

Read moreOn the missing at random assumption in longitudinal trials