I’ve been using Thomas Lumley’s excellent mitools package in R for applying Rubin’s rules for multiple imputation ever since I wrote the smcfcs package in R. Somebody recently asked me about how they could obtain p-values corresponding to the Rubin’s rules results calculated by the MIcombine function in mitools. In this short post I’ll give some R code to calculate these.
First, let’s load the mitools package and apply MIcombine to the data frame smi, which is provided as part of mitools. smi contains five imputations of data from the Victorian Adolescent Health Cohort Study. The example code for MIcombine is as follows:
library(mitools)
data(smi)
models<-with(smi, glm(drinkreg~wave*sex,family=binomial()))
summary(MIcombine(models))
which gives
Multiple imputation results:
with(smi, glm(drinkreg ~ wave * sex, family = binomial()))
MIcombine.default(models)
results se (lower upper) missInfo
(Intercept) -2.25974358 0.26830731 -2.78584855 -1.7336386 4 %
wave 0.24055250 0.06587423 0.11092461 0.3701804 12 %
sex 0.64905222 0.34919264 -0.03537187 1.3334763 1 %
wave:sex -0.03725422 0.08609199 -0.20623121 0.1317228 7 %
Rubin’s rules inference is based on using t-distributions, with degrees of freedom calculated using a formula which depends on the number of imputations, the between imputation variance, and the within imputation variance, as described by (for example) Schafer 1999. To calculate the p-values for each coefficient, we can use the following code:
miRes <- MIcombine(models)
tStat <- miRes$coefficients/sqrt(diag(miRes$variance))
#p-values
round(2*pt(-abs(tStat),df=miRes$df),3)
This code first calculates the t-statistics as the coefficients divided by the corresponding standard errors. It then calculates the 2-sided p-value (rounded to 3 decimal places) as twice the tail probability for the corresponding t-distribution. Running the above code gives:
(Intercept) wave sex wave:sex
0.000 0.000 0.063 0.665
for the p-values. Alternatively we can wrap this up in a little function with a default value of 3 decimal places:
MIcombineP <- function(MIcombineRes,digits=3) {
tStat <- MIcombineRes$coefficients/sqrt(diag(MIcombineRes$variance))
round(2*pt(-abs(tStat),df=MIcombineRes$df),digits)
}
which can then be called as:
MIcombineP(MIcombine(models))
Thanks! Really helpful! Just a typo in the last line of the function, instead of df=miRes$df it should be df=MIcombineRes$df to make it work.
Thanks Juan for spotting that – fixed now!
Hi Jonathan, This is really helpful! I think the dfs are the “old” dfs (e.g., https://bookdown.org/mwheymans/bookmi/rubins-rules.html), right? Is there an easy way to use the Barnard and Rubin (1999) adjusted dfs in this function ? Would you generally recommend using the adjusted dfs over the old dfs?
No, I think (from checking the source code of mitools) that it will use Barnard and Rubin if, when you call MIcombine you pass the complete data degrees of freedom to the df.complete argument. In the post I used a logistic regression example, for which with complete data our standard inference approach assumes a normal distribution (i.e. infinite degrees of freedom).