I was recently asked about whether smcfcs, my R and Stata packages for multiple imputation of covariates, can accommodate non-linear relationships between covariates. The answer is yes, and in this post I’ll illustrate how this can be done.
Missing data
Maximum likelihood multiple imputation
I just came across a very interesting draft paper on arXiv by Paul von Hippel on ‘maximum likelihood multiple imputation’. von Hippel has made many important contributions to the multiple imputation (MI) literature, including the paper which advocated that one ‘transform then impute’ when one has interaction or non-linear terms in the substantive model of interest. The present paper on maximum likelihood multiple imputation is in its seventh draft on arXiv, the first being released back in 2012. I haven’t read every detail of the paper, but it looks to me to be another thought provoking and potentially practice changing paper. This post will not attempt by any means to cover all of the important points made in the paper, but will just highlight a few.
Imputing missing covariates in nested case-control and case cohort studies
I’m pleased to announce a new version (1.3.0) of the smcfcs package for multiple imputation of missing covariates. Thanks to Ruth Keogh at the London School of Hygiene & Tropical Medicine, this new version features two additional functions, smcfcs.casecohort and smcfcs.nestedcc. These allow for imputing of missing covariates in case cohort and nested case-control studies respectively. A paper describing the methodology is forthcoming.
The package is now on CRAN and so can be installed/updated in the usual way from R or RStudio.
There are of course various papers on the case cohort and nested case control study designs. For further reading, I’d recommend looking at Ruth’s book, co-authored with David Cox, ‘Case-Control Studies’, which contains a chapter on each design.
11/06/2018 – the corresponding paper has now been published in Biometrics.