A student asked me recently whether the log rank test for time to event data *assumes* that the hazard ratio between the two groups is constant over time, as is assumed in Cox’s famous proportional hazards model. The BMJ ‘Statistics at square one’ Survival Analysis article for example says the test assumes:

That the risk of an event in one group relative to the other does not change with time. Thus if linoleic acid reduces the risk of death in patients with colorectal cancer, then this risk reduction does not change with time (the so called proportional hazards assumption ).

https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/12-survival-analysis

Personally I would not say the log rank test *assumes* proportional hazards. Under the null hypothesis that the (true) survival curves in the two groups are the same, or equivalently that the hazard functions are identical in the two groups, the log rank test would only wrongly reject 5% of the time. Of course under this null the hazards are proportional (indeed identical).

When this null does not hold, if the hazard ratio is constant over time, the log rank test is the most powerful test. When it is not constant over time it is not optimal in terms of power, but the non-constant hazard ratio does not invalidate the test per se. It just means that there may be alternative methods of analysis that might be preferable (see my recent PSI event slides here).

I agree. Therefore, given a Cox model and log rank test give identical results if both use a Score test, does a Cox also not ‘assume” proportional hazards? And when we say ‘assume ‘ we actually mean most powerful under H1 when PH holds??

I agree the hypothesis (score) test of beta=0 from Cox (as you say equivalent to log rank) does not assume proportional hazards for type 1 error control. But the Cox

modelis a semiparametric model for the data, with parameter beta, and which makes the assumption/restriction of proportional hazards.The Cox model and the logrank test make the proportional hazards assumption to the identical degree. This has to be the case since the logrank test statistic is exactly the score test under the Cox model. The null case is a triviality. Under the null, any model that only makes assumptions about relative effects will hold. So the proportional odds assumption holds under the null.

But the Cox

modelis a model, not just atest, whereas the logrank test is just a test of nonparametric null hypothesis which specifies equal survival functions. As you say, the Cox score test of beta=0 and log rank test are the same, but the Cox model (as opposed to test) does assume proportional hazards, and it defines this time constant ratio in hazards as its parameter. If the proportional hazards assumption does not hold, Cox’s estimator does not generally converge to a sensible well defined estimand. In contrast, the log rank testmayremain useful in the sense that it still has power to reject the null, although as we know well, there are scenarios where the power of alternative tests (e.g. weighted log rank tests) may be materially higher.Jonathan I’m not getting that logic. The Cox model is a model and has 3 tests associated with it, one of them being the score test. The log rank test remains useful under non-PH exactly by the same amount that the Cox model remains useful under non-PH. My opinion is that we should be speaking of what assumptions are required for a method to be optimal, or at least to perform well. Both Cox and long rank require PH to be satisfied to be optimal.

As the author of Statistics at Square One I’d like to thank Jonathan for his sensible comments. Obviously a test is only based on the null hypothesis and so doesn’t ‘assume’ proportional odds for the alternative hypothesis in the similar way that a Mann-Whitney U test doesn’t ‘assume’ a shift in location of the outcome for the two groups under the alternative. What perhaps I meant to say was that it is helpful to assume proportional hazards to interpret results. Later on in the chapter I point out that the log-rank test is quite ‘robust’ to departures from proportional hazards but one should always look at the survival curves and if, for example, the curves from the two groups appear quite different but cross, then the log-rank test should not be used. If there is no difference in the two groups then the curves might cross by chance but they would be close. This comment came at an opportune moment, since I am preparing Edition 12 of Statistiocs at Square One, and was able to modify the statement given in the 11th edition. Thanks

Thanks Mike, I did not know you were the author of the series! Best wishes, Jonathan.

The proportional odds mode / Wilcoxon test relationship is isomorphic to the Cox / log rank test relationship so it’s great that you brought this up. For the Wilcoxon test to have optimal power the proportional odds assumption must be satisfied. Every test assumes something. Tests assume much less for type I assertion probability alpha to be accurate, and assume more for good type II probability beta to be low.