This post was prompted by a tweet by Frank Harrell yesterday asking:
In this post I’ll say a little bit about trying to answer Frank’s question, and then a little bit about an alternative question which I posed in response, namely, how does the interpretation change if the interval is a Bayesian credible interval, rather than a frequentist confidence interval.
Frequentist confidence intervals
A frequentist 95% confidence interval is constructed such that if the model assumptions are correct, if you were to (hypothetically) repeat the experiment or sampling many many times, 95% of the intervals constructed would contain the true value of the parameter. I made a short video last year which performs a simulation in R to demonstrate this definition/idea.
Frank asked how one interprets a particular realised confidence interval (0.72,0.91). The difficulty (as he is well aware!) is that all a frequentist can say is that this particular realised interval either does or does not contain the true parameter value, and they cannot tell you if it does or doesn’t. All they can say is that if modelling assumptions are correct, in hypothetical repetitions 95% of the intervals constructed using this procedure contain the true value.
Bayesian credible intervals
Let’s now suppose that we’ve done a Bayesian analysis. We’ve specified a prior distribution for the parameter, based on prior evidence, our subjective beliefs about the value of the parameter, or perhaps we used a default ‘non-informative’ prior built into our software package. We use the same model as before, and Bayes theorem gives us the posterior distribution. A Bayesian posterior credible interval is constructed, and suppose it gives us some values. For the sake of simplicity, I’ll assume the interval is again 0.72 to 0.91, but this is not done to suggest a Bayesian analysis credible interval will generally be identical to the frequentist’s confidence interval.
How should we interpret this credible interval? A Bayesian would I think say say something like it is an interval for which there is a 95% chance or probability the true parameter lies. At this point we must ask, what do they mean by 95% probability?
We could interpret it as a classical long run frequentist probability, but this means interpreting it like a confidence interval. In fact Bayesian procedures often have good frequentist properties. For example see Wang and Robins 1998 for an analysis of the frequentist properties of multiple imputation for missing data, or Bartlett and Keogh 2018 for a simulation investigation of the frequentist properties of Bayesian approaches for handling covariate measurement error. In fact, under certain conditions, Bayesian procedures achieve the same frequentist properties of maximum likelihood methods when the sample size gets large – see Chapter 4 of Gelman et al‘s excellent Bayesian Data Analysis book.
But conceptually we do not choose to do a Bayesian analysis simply as a means to performing frequentist inference. We choose it because it (hopefully) answers more directly what we are interested in (see Frank Harrell’s ‘My Journey From Frequentist to Bayesian Statistics‘ post). Namely, it enables us to make probability statements about the unknown parameter given our model, the prior, and the data we have observed. So what is the interpretation of the 95% chance or probability for a credible interval?
I don’t know the no doubt large literature on this topic well at all, but the Bayesian’s interpretation or definition of probability isn’t that clear to me. The Wikipedia entry on Bayesian probability says:
Broadly speaking, there are two interpretations of Bayesian probability. For objectivists, who interpret probability as an extension of logic, probability quantifies the reasonable expectation that everyone (even a “robot”) who shares the same knowledge should share in accordance with the rules of Bayesian statistics, which can be justified by Cox’s theorem.[2][8] For subjectivists, probability corresponds to a personal belief.[3] Rationality and coherence allow for substantial variation within the constraints they pose; the constraints are justified by the Dutch book argument or by decision theory and de Finetti’s theorem.[3] The objective and subjective variants of Bayesian probability differ mainly in their interpretation and construction of the prior probability.
Wikipedia entry on Bayesian probability
Section 1.5 ‘Probability as a measure of uncertainty’ of Bayesian Data Analysis talks about the way Bayesian analysis uses probability as a measure of uncertainty, but to my mind it doesn’t really define the concept. This is not a criticism. As Gelman et al say earlier in their book:
Rather than argue the foundations of statistics—see the bibliographic note at the end of this chapter for references to foundational debates—we prefer to concentrate on the pragmatic advantages of the Bayesian framework, whose flexibility and generality allow it to cope with complex problems.
If part of the appeal of Bayesian inference is that it answers the question we really want (i.e. conditional on what we’ve seen, what do we know / believe about the parameter(s)), it seems to me that the interpretation or definition of prior/posterior probabilities should be relatively straightforward and clear to us. But for me at least, it isn’t. I am quite confident (whatever I mean by that!) that this reflects my ignorance on the topic. Part of my motivation for writing this post is the hope that people will help me understand better how to unambiguously define what is meant by a Bayesian prior/posterior probability. Please write a comment if you can help in this regard.
The above does not mean I don’t like Bayesian methods. Indeed much of the last 10 years I have been working with and using methods like multiple imputation for missing data whose development take place in the Bayesian paradigm. For me this is fine because I know that methods like multiple imputation have good frequentist properties, and while there are definitely interpretational issues with things confidence intervals, I at least think I understand what they claim to do/be.
22nd November 2022 postscript
This evening there was further discussion of this topic on Twitter. As part of this Frank Harrell offered an interpretation for the Bayesian credible interval as follows:
I followed up with him as to the nature of the probability being referred to here, since it is clear that the probability notion being invoked is broader or distinct than the relative frequency notion of probability. Frank helpfully pointed me in the direction of the entry for ‘probability’ in his course glossary of terms, which can be accessed here. Part of this entry says:
The meaning attached to the metric known as a probability is up to the user; it can represent long-run relative frequency of repeatable observations, a degree of belief, or a measure of veracity or plausibility.
https://hbiostat.org/doc/glossary.pdf
and
There are other schools of probability that do not require the notion of replication at all. For example, the school of subjective probability (associated with the Bayesian school) “considers probability as a measure of the degree of belief of a given subject in the occurrence of an event or, more generally, in the veracity of a given assertion” (see P. 55 of5). de Finetti defined subjective probability in terms of wagers and odds in betting. A risk-neutral individual would be willing to wager $P that an event will occur when the payoff is $1 and her subjective probability is P for the event.
https://hbiostat.org/doc/glossary.pdf
As I wrote in the original post, I do not know the extensive literature on this topic well at all. But what Frank has summarised here is really useful in aiding my understanding of what the interpretation should be of the 95% probability statement attached to a 95% credible interval. It seems to me the notion of probability being invoked when interpreting a 95% credible interval has to be the subjective probability / degree of belief one described in the previous quote. Whether this definition/interpretation meets the criterion of being ‘exact’, as required by Frank in his tweet at the top of this post when asking for the exact interpretation of a particular realised frequentist confidence interval, I leave readers to decide.
I’m not clear on what it is that you don’t understand about interpretation of Bayesian credible intervals.
Both “objective” and “subjective” Bayesians interpret them as degrees of belief – that, for instance, one would use to make bets (supposing, of course, that you have no moral objection to gambling, etc.).
The difference is that that “objective” Bayesians think that one can formalize “what one knows” and then create an “objective” prior on that basis, that everyone “with the same knowledge” would agree is correct. I don’t buy this. Formalizing “what one knows” by any means other than specifying a prior (which would defeat the point) seems impossible. And supposing one did, there is disagreement about what an “objective” prior based on it would be. To joke, “The best thing about objective priors is there are so many of them to choose from!”.
Many simple examples can illustrate that the objective Bayesian framework just isn’t going to work. One example is the one-way random effects model, where the prior on the variance of the random effects will sometimes have a large influence on the inference (eg, on the posterior probability that the overall mean is positive), but where there is no sensible “objective” prior – you just have to subjectively specify how likely it is that the variance is very close to zero. Another even simpler example is inference for theta given an observation x~N(theta,1), when it is known (with certainty) that theta is non-negative, and the observed x is -1. There’s just no alternative to subjectively deciding how likely a priori it is that theta is close to zero.
Frequentist methods also don’t give sensible answers in these examples. Subjective Bayesianism is the only way.
Many thanks Radford. I guess what I’m looking for then is a more precise definition of the phrase ‘degree of belief’. If this definition involves how I would behave when offered bets about the occurrence of random/uncertain events, does this not end up depending on the notion of probability as a relative frequency of events occurring?
Not in the Bayesian view of things. Consider betting on the mass of the Higgs Boson. There’s only one value, not repeated (unless you go for a multiverse-style cosmology), so what could a frequentist view of the bet possibly be? But clearly betting on the mass of the Higgs Boson is not a meaningless thing to do.
And in normal life, people make all sorts of decisions – eg, Should I try to cross this street in this location, considering how much traffic there is? Is that person in the distance that I see my cousin or not? – that could be regarded as being based on frequentist probabilities only with very artificial interpretations.
So for the frequentist there is a fixed unknown mass of the Higgs Boson. If someone offers them a bet as to whether it lies above a certain value, the fact they have offered this bet doesn’t make it random (to the frequentist). You ask what could a frequentist view of the bet possibly be, which I take to mean how might they decide whether to take the bet or not. Here’s one approach they could take. Let’s suppose the frequentist has constructed a one-sided lower 90% confidence interval for the mass of the Higgs Boson based on all the experimental data to date. If the bet says “you win if the true value turns out to be greater than x”, then the frequentist could take the bet if their one-sided interval contained x. They wouldn’t be able to make any statement about the probability that they win this bet if they take it, since if they take it, whether they win it is an unknown but non-random 0/1 value. However, they could I would think construct some guarantees about how often they would win bets if they decide to take them according to this procedure across a population of hypotheses, where for each hypothesis they have data, under some assumptions. I would guess people have gone through these sort of arguments/calculations in the literature, but I’m not familiar with the literature on this.
Your reply said in the Bayesian view of things the definition of degree of belief does not rely on relative frequencies. It would be great if you could expand on how you would define it then (without using relatively frequency type notions). I am genuinely looking for more mathematical descriptions of how it should be defined/interpreted which go beyond “the probability/belief the parameter lies in the interval is 95%”.
If a frequentist could get someone to commit to always accepting a bet, then the frequentist would indeed win 90% of the bets that the true parameter is in their 90% C.I., in the long run. But of course this isn’t a realistic betting scenario – if someone makes that commitment, one could win 100% of the time by just always betting the parameter is in an unbounded interval. If the other person doesn’t always accept any bet, then there are theorems about how one has to be Bayesian to avoid getting exploited. Those and other theoretical consistency requirements are often put forth as reasons to be Bayesian, but of course these arguments all involve various assumptions, which explains why not every statistician is Bayesian. As is the case for any paradigm, the real reason to be Bayesian comes from working in the framework and seeing how in practice it coheres in a way that doesn’t happen for frequentist statistics.
On the other hand, there are problems. The computational difficulty of some Bayesian inference problems is one – what do you do if you know how to get the right answer, but don’t have the computational power to actually get it? The intellectual difficulty in formulating complex prior knowledge in high dimensional spaces is another problem. But I think it helps a lot to be clear on what one would ideally like to do before coming up with some hack to try to get something close to that in practice. (As opposed to hacking around with no idea what you’re trying to do.)
That x~N(theta,1) is a great example actually for showing Bayesian tests can go wrong if you pick inappropriate priors. From Lindley, X|mu ~ N(mu,1). The test is H0: mu=0 vs Ha: mu>0. The priors on the parameter really don’t matter, but say Pr(mu=0)=.50 and Pr(mu>0)=.50. In an attempt to use a noninformative prior, take the density of mu given mu>0 to be flat on the half line. Note that this in an improper prior, but similar proper priors lead to similar results. The Bayesian test compares the density of the data X under H0 to the average density of the data under Ha. The average density under the alternative makes any X you could possibly see infinitely more probable to have come from the null distribution than the alternative. Thus, anything you could possibly see will cause you to accept mu=0. Effectively, all the probability is placed on unreasonably large values of mu so by comparison mu=0 always looks reasonable.
I’d recommend the introduction of Schweder and Hjort’s “Cinfidence, Likelihood, and Probability” (https://www.uio.no/studier/emner/matnat/math/STK4180/v16/clp_oct2015_pages1to205.pdf) for a great discussion of interpreting posterior probabilities.
The introduction and rest of the book also give a nice presentation of valid posterior interpretation of the values of specific realized frequentist intervals as quantiles of a confidence distribution. The long-run coverage properties of a frequentist confidence interval do not refer to repetitions of a specific experiment, they refer to the properties of the estimator applied to classes of problems.
The Higgs boson example is actually a perfect example of the type of problem frequentist inference aims to solve. There is a single unchanging parameter with an unknown value. Measurements are taken to estimate the value of that parameter. This is precisely the type of case for which frequentist inference was designed.
You’re right that the Higgs Boson example is of the type that frequentist inference is meant to solve. The question is of course whether or not it succeeds. If the purpose is to provide a rational justification for making a bet, then it doesn’t.
Physicists actually seem to regard frequentist statistics as a ritual that they need to go though in order to justify a claim to have “discovered” something. The ritual has the side effect of mostly eliminating false “discoveries”. Similarly, a religious cleanliness ritual might have the side effect of reducing disease transmission. In neither case do those conducting the ritual have a rational justification for what they are doing.
I’m doing some updates of this page, but here it is as of now http://www.statisticool.com/objectionstofrequentism.htm While there are general axioms that defend any notion of probability practically, note that Kolmogorov himself wrote “The basis for the applicability of the results of the mathematical theory of probability to real ‘random phenomena’ must depend on some form of the frequency concept of probability, the unavoidable nature of which has been established by von Mises in a spirited manner.” I’ll note for Frank that likelihoods swamp priors, not the other way around, which kind of emphasizes the non-importance of priors and hence beliefs when you have data. Also, how does one interpret a CI if a different equally defendable priors were used and one interval is say [.7, .9] but the other is [.1, .3] ? How much credence in the 95% does that give the first (or second) interval?