Note: if you read this post, make sure to read the comments/discussion below it with Richard Morey, author of the paper in question, who put me straight on a number of points.
Thanks to Twitter I came across the latest draft of a very nicely written and thought provoking paper “The fallacy of placing confidence in confidence intervals”, by Morey, Rouder, Hoekstra, Lee and Wagenmakers. The paper aims to show why frequentist confidence intervals do not posses a number of properties that researchers often believe that they do. In contrast, they show that Bayesian credible intervals posses these desired properties, and advocate the replacement of confidence intervals with Bayesian credible intervals.
The fundamental fallacy
Their paper is very nice, and I highly recommend people to read it in general, and also so that the following makes sense. I am personally becoming more of a Bayesian by the day. This post is therefore not a defence of confidence intervals in general. Nonetheless, I do take issue somewhat with the author’s ‘fundamental confidence fallacy’, which I think comes down to different perspectives on the meaning of probability. My view is probably a consequence of my ignorance and failure of understanding, but I shall describe my view here in any case, so that I might be corrected in my thinking.
The authors describe this fundamental fallacy as:
“If the probability that a random interval contains the true value is X%, then the plausibility or probability that a particular observed interval contains the true value is X%,” or “We have X% confidence that the observed interval contains the true value.”
The authors describe an example situation (locating a submarine), with a single unknown parameter . They then describe a number of different procedures for constructing a 50% confidence interval for . In the particular setup they consider, the different confidence intervals are nested within each other. They then argue (following Fisher apparently):
“If all intervals had a 50% probability of containing the true value, then all the probability must be contained in the shortest of the intervals. Because each procedure is by itself a 50% procedure, the procedure which chooses the shortest of 50% intervals will contain the true value less than 50% of the time. Hence believing the FCF (fundamental confidence fallacy) results in a logical contradiction.”
This quote begins “If all intervals had a 50% probability of containing the true value”. First, and as the authors clearly understand, given the datasets, and the calculated CIs, in truth each CI either does or does not contain the true value. Thus if we make a statement that an interval has a 50% probability of containing the truth, we must be clear what we mean by saying it has 50% probability of being true. A Bayesian uses a definition of probability in terms of measure of certainty or belief, and as far as I understand it, does not make reference to long run frequencies. In contrast, a frequentist defines probabilities in terms of an imagined repetition of experiments. In the case of a 50% CI, the frequentist means that in 50% of repeated experiments the CI would contain the true value.
After some deductions, the authors’ argument ends “the procedure which chooses the shortest of 50% intervals will contain the true value less than 50% of the time”. What do the authors mean here by “50% of the time”? It sounds to me as if they are essentially calibrating the 50% probability statement by reference to a long run sequence of experiments. If this is indeed the case, and the procedure they have chosen (the one which produces the shortest intervals) is indeed a 50% CI, then, contrary to their conclusion, it will surely contain the true value in 50% of repeated experiments.
The authors then go on to show a different way why the fallacy cannot be true. In the submarine example (Fig 1b), they give a dataset where by the nature of the problem (read the paper to see the details), you could almost with certainty logically infer the value of . Thus with this dataset you could logically deduct (or obtain the same answer by using the likelihood function) the true of . The authors point out however that three of the CIs they consider encompass the likelihood interval (the range values for which the likelihood function is non-zero). They then argue
“…meaning that there is 100% certainty that these 50% confidence intervals contain the hatch. Reporting 50% certainty in an interval that surely contains the parameter would clearly be a mistake”
Again, this ‘proof’ of the fallacy being false seems problematic, in the following way. If someone gives you one of these three 50% CIs, and you know nothing else, then you might (although the authors wouldn’t!) state that your interval has a 50% probability (in the long run sense) of containing the truth, and I think this is correct. Now suppose someone then explains to you that because of the structure of the problem and the data observed, it is in fact possible to logically deduce that the true value of the parameter is guaranteed to be included in your 50% CI. In this case, it would be ridiculous to just report the 50% CI, rather than reporting that in fact we can deduce the true value of the parameter. Nonetheless, in my view this does not invalidate the 50% probability statement that was made before this clever person came along and explained to you that you could in fact deduce the true value of the parameter here. Here, basing inferences on the likelihood function or a Bayesian analysis one would be able to conclude with certainty the value of , while the researcher only using one of the three aforementioned CIs would not. In this example, and no doubt others, this shows that frequentist methods can be sub-optimal as a method of inference, and I would entirely agree with that conclusion. However, for me this does not render the ‘fundamental fallacy’ false.
On giving up confidence intervals and instead being a Bayesian
The paper concludes by arguing that giving up CIs is highly advisable, in that we do not lose much, but would gain a lot by using Bayesian inference. I think I agree that one does again a lot by being a Bayesian. However, in practice it is arguably not as simple to be a Bayesian as the authors seem to imply. The difficulty of course is that to perform a Bayesian analysis we must in addition to the model specify priors for the parameters, which is sometimes easier said than done. That is not to say that specifying priors is impossible or always difficult, but simply that prior specification is not a piece of cake, and sometimes (small datasets or not much information about parameters) materially affects posterior inferences. Moreover, while for simple models it may be relatively straightforward to think about what is reasonable to specify as a prior, for more complex models it seems to me to become much more problematic. For example, in a complex model of longitudinal data, using random effects to model trajectories with cubic splines, what would be my prior about the variances of the random effects and their correlations? I’m really not sure!