Confidence intervals for the hazard ratio in RCTs which agree with log rank test

The log rank test is often used to test the hypothesis of equality for the survival functions of two treatment groups in a randomised controlled trial. Alongside this, trials often estimate the hazard ratio (HR) comparing the hazards of failure in the two groups. Typically the HR is estimated by fitting Cox's proportional hazards model, and a 95% confidence interval is used to indicate the precision of the estimated HR.

There are of course many different ways of constructing confidence intervals for parameter estimates. For estimates found by the method of maximum likelihood, we most often use so called Wald intervals, which are formed by taking the estimated log HR plus and minus 1.96 standard errors. A drawback of the Wald interval is that it is possible for the log rank p-value to be statistically significant, but for the Wald 95% interval for the HR to include the null value of 1, leading to an apparently inconsistent result.

One approach to avoid the possibility of this inconsistency is to form the CI based on the likelihood score test. When there are no tied failure times in the dataset, this approach gives a 95% CI for the HR which includes 1 if and only if the log rank test p-value is greater than 0.05. Unfortunately, this concordance no longer holds when there are ties.

An alternative is to estimate the HR and form a 95% CI based on an approach proposed by Peto. The 95% CI for the HR formed using Peto's method contains 1 if and only if the log rank test p-value is greater than 1, even when there are ties. Unfortunately, as shown in a recently published paper by Lin et al in Biometrics, Peto's estimator for the HR is not consistent. Thus even in large samples, it is biased (although perhaps not much), and consequently the corresponding 95% CI does not have the correct coverage level.

To address this, Lin et al propose a modification to the likelihood score test, and invert this modified score test to form a 95% CI for the HR. Their approach ensures consistency with the log rank test p-value, including in the case that stratification factors are included. Unlike the CI found from Peto's method, their proposed CI has correct coverage, and compared to the Wald based CI, is generally narrower.

Lin et al's approach requires use of a numerical method to find the CI limits, and they have made available a SAS macro implementing their method, available here. Their approach seems attractive, and it will be interesting to see how quickly it is taken up in trial analyses.

Leave a Reply