P-values after multiple imputation using mitools in R

I’ve been using Thomas Lumley’s excellent mitools package in R for applying Rubin’s rules for multiple imputation ever since I wrote the smcfcs package in R. Somebody recently asked me about how they could obtain p-values corresponding to the Rubin’s rules results calculated by the MIcombine function in mitools. In this short post I’ll give some R code to calculate these.

First, let’s load the mitools package and apply MIcombine to the data frame smi, which is provided as part of mitools. smi contains five imputations of data from the Victorian Adolescent Health Cohort Study. The example code for MIcombine is as follows:

library(mitools)

data(smi)
models<-with(smi, glm(drinkreg~wave*sex,family=binomial()))
summary(MIcombine(models))

which gives

Multiple imputation results:
      with(smi, glm(drinkreg ~ wave * sex, family = binomial()))
      MIcombine.default(models)
                results         se      (lower     upper) missInfo
(Intercept) -2.25974358 0.26830731 -2.78584855 -1.7336386      4 %
wave         0.24055250 0.06587423  0.11092461  0.3701804     12 %
sex          0.64905222 0.34919264 -0.03537187  1.3334763      1 %
wave:sex    -0.03725422 0.08609199 -0.20623121  0.1317228      7 %

Rubin’s rules inference is based on using t-distributions, with degrees of freedom calculated using a formula which depends on the number of imputations, the between imputation variance, and the within imputation variance, as described by (for example) Schafer 1999. To calculate the p-values for each coefficient, we can use the following code:

miRes <- MIcombine(models)
tStat <- miRes$coefficients/sqrt(diag(miRes$variance))
#p-values
round(2*pt(-abs(tStat),df=miRes$df),3)

This code first calculates the t-statistics as the coefficients divided by the corresponding standard errors. It then calculates the 2-sided p-value (rounded to 3 decimal places) as twice the tail probability for the corresponding t-distribution. Running the above code gives:

(Intercept)        wave         sex    wave:sex 
      0.000       0.000       0.063       0.665

for the p-values. Alternatively we can wrap this up in a little function with a default value of 3 decimal places:

MIcombineP <- function(MIcombineRes,digits=3) {
  tStat <- MIcombineRes$coefficients/sqrt(diag(MIcombineRes$variance))
  round(2*pt(-abs(tStat),df=MIcombineRes$df),digits)
}

which can then be called as:

MIcombineP(MIcombine(models))

4 thoughts on “P-values after multiple imputation using mitools in R”

Juan

July 3, 2021 at 3:19 pm

Thanks! Really helpful! Just a typo in the last line of the function, instead of df=miRes$df it should be df=MIcombineRes$df to make it work.
- Jonathan Bartlett
  
  July 5, 2021 at 9:14 pm
  
  Thanks Juan for spotting that – fixed now!
Losos

August 19, 2021 at 11:29 am

Hi Jonathan, This is really helpful! I think the dfs are the “old” dfs (e.g., https://bookdown.org/mwheymans/bookmi/rubins-rules.html), right? Is there an easy way to use the Barnard and Rubin (1999) adjusted dfs in this function ? Would you generally recommend using the adjusted dfs over the old dfs?
- Jonathan Bartlett
  
  September 17, 2021 at 5:42 pm
  
  No, I think (from checking the source code of mitools) that it will use Barnard and Rubin if, when you call MIcombine you pass the complete data degrees of freedom to the df.complete argument. In the post I used a logistic regression example, for which with complete data our standard inference approach assumes a normal distribution (i.e. infinite degrees of freedom).

4 thoughts on “P-values after multiple imputation using mitools in R”

Leave a Reply to Jonathan BartlettCancel reply