P-values after multiple imputation using mitools in R

I’ve been using Thomas Lumley’s excellent mitools package in R for applying Rubin’s rules for multiple imputation ever since I wrote the smcfcs package in R. Somebody recently asked me about how they could obtain p-values corresponding to the Rubin’s rules results calculated by the MIcombine function in mitools. In this short post I’ll give some R code to calculate these.

First, let’s load the mitools package and apply MIcombine to the data frame smi, which is provided as part of mitools. smi contains five imputations of data from the Victorian Adolescent Health Cohort Study. The example code for MIcombine is as follows:

library(mitools)

data(smi)
models<-with(smi, glm(drinkreg~wave*sex,family=binomial()))
summary(MIcombine(models))

which gives

Multiple imputation results:
      with(smi, glm(drinkreg ~ wave * sex, family = binomial()))
      MIcombine.default(models)
                results         se      (lower     upper) missInfo
(Intercept) -2.25974358 0.26830731 -2.78584855 -1.7336386      4 %
wave         0.24055250 0.06587423  0.11092461  0.3701804     12 %
sex          0.64905222 0.34919264 -0.03537187  1.3334763      1 %
wave:sex    -0.03725422 0.08609199 -0.20623121  0.1317228      7 %

Rubin’s rules inference is based on using t-distributions, with degrees of freedom calculated using a formula which depends on the number of imputations, the between imputation variance, and the within imputation variance, as described by (for example) Schafer 1999. To calculate the p-values for each coefficient, we can use the following code:

miRes <- MIcombine(models)
tStat <- miRes$coefficients/sqrt(diag(miRes$variance))
#p-values
round(2*pt(-abs(tStat),df=miRes$df),3)

This code first calculates the t-statistics as the coefficients divided by the corresponding standard errors. It then calculates the 2-sided p-value (rounded to 3 decimal places) as twice the tail probability for the corresponding t-distribution. Running the above code gives:

(Intercept)        wave         sex    wave:sex 
      0.000       0.000       0.063       0.665 

for the p-values. Alternatively we can wrap this up in a little function with a default value of 3 decimal places:

MIcombineP <- function(MIcombineRes,digits=3) {
  tStat <- MIcombineRes$coefficients/sqrt(diag(MIcombineRes$variance))
  round(2*pt(-abs(tStat),df=MIcombineRes$df),digits)
}

which can then be called as:

MIcombineP(MIcombine(models))

4 thoughts on “P-values after multiple imputation using mitools in R”

  1. Thanks! Really helpful! Just a typo in the last line of the function, instead of df=miRes$df it should be df=MIcombineRes$df to make it work.

    Reply
    • No, I think (from checking the source code of mitools) that it will use Barnard and Rubin if, when you call MIcombine you pass the complete data degrees of freedom to the df.complete argument. In the post I used a logistic regression example, for which with complete data our standard inference approach assumes a normal distribution (i.e. infinite degrees of freedom).

      Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.