Multiple imputation separately by groups in R and Stata

When using multiple imputation to impute missing values there are often situations where one wants to perform the imputation process completely separately in groups of subjects defined by some fully observed variable (e.g. sex or treatment group). In Stata, this is made very easy through use of the by() option. You simply tell the mi impute command what variable (or variables) you want to perform the imputation stratified on. Stata will then impute separately in groups defined by this variable(s), and then assemble the imputations of each strata back together so you have your desired number of imputed datasets.

Last week someone asked me how to do it in R, ideally with the mice package. Compared to Stata, one has to do a little bit more work. One approach is to use the mice.impute.bygroup function in the miceadds package, a package which extends functionality for mice in various directions. If you instead want to do it manually, you can do so by making using of the rbind function within the mice package.

Read more

Does the log rank test assume proportional hazards?

A student asked me recently whether the log rank test for time to event data assumes that the hazard ratio between the two groups is constant over time, as is assumed in Cox’s famous proportional hazards model. The BMJ ‘Statistics at square one’ Survival Analysis article for example says the test assumes:

That the risk of an event in one group relative to the other does not change with time. Thus if linoleic acid reduces the risk of death in patients with colorectal cancer, then this risk reduction does not change with time (the so called proportional hazards assumption ).

https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/12-survival-analysis

Personally I would not say the log rank test assumes proportional hazards. Under the null hypothesis that the (true) survival curves in the two groups are the same, or equivalently that the hazard functions are identical in the two groups, the log rank test would only wrongly reject 5% of the time. Of course under this null the hazards are proportional (indeed identical).

When this null does not hold, if the hazard ratio is constant over time, the log rank test is the most powerful test. When it is not constant over time it is not optimal in terms of power, but the non-constant hazard ratio does not invalidate the test per se. It just means that there may be alternative methods of analysis that might be preferable (see my recent PSI event slides here).

Reference based multiple imputation – what’s the right variance and how to estimate it?

Reference based multiple imputation methods have become a popular approach for handling missing data in the analysis of randomised trials (Carpenter et al 2013). Very roughly speaking, they impute missing outcomes in patients in the active arm assuming that the missing outcomes behave as if the patient switched onto the control treatment. This is in contrast to what is now the standard approach, based on the missing at random assumption, which effectively imputes missing outcomes for patients in a given arm as if they remained on the same treatment as they were randomised to.

Soon after reference based MI methods were proposed, people started noticing that Rubin’s rules variance estimator, which is the standard approach for analysing multiply imputed datasets, overstated the variance of treatment effects compared to the true frequentist variance of the effect estimator (Seaman et al 2014). This means that if Rubin’s rules are used, the type 1 error will be less than the standard 5% level if the null hypothesis is true, and power is lower (sometimes substantially) than if the frequentist variance were used for inference.

In a new pre-print on arXiv I review the congeniality issue and the bias in Rubin’s variance estimator, and summarise some of the arguments made in favour and against using Rubin’s rules with reference based methods. In the end I personally conclude that the frequentist variance is the ‘right’ one, but that we should scrutinise further whether the referenced based assumptions are reasonable in light of the behaviour they cause for inferences. For instance, they lead to a situation where the more data are missing, the more certain we are about the value of treatment effect, which would ordinarily seem incorrect.

I also review different approaches for estimating the frequentist variance, should one decide it is of interest, including efficiently combining bootstrapping with multiple imputation, as proposed by Paul von Hippel and myself a paper (in press at Statistical Science) and available to view here.

I hope the paper stimulates further debate as to what the right variance is for reference based methods, and would very much welcome any comments on it.

19th July 2021 – a short talk about this work can be viewed here.

22nd September 2021 – this work has now been published in the journal Statistics in Biopharmaceutical Research, and is available open-access here.