Does a Bernoulli/binomial model really assume everyone has the same probability p?

When you estimate a proportion and want to calculate a standard error for the estimate, you would normally do so based on assuming that the number of ‘successes’ in the sample is a draw from a binomial distribution, which counts the number of successes in a series of n independent Bernoulli 0/1 draws, where each draw has a probability p of ‘success’. Does the model rely or assume that for each of these binary observations the success probability is the same? In the third paragraph of this blog post Frank Harrell (seems to) argue that it does. In this post I’ll delve into this a bit further, using the same numerical example Frank gives.

Suppose we have a random sample of n individuals on whom we observe a binary outcome indicating presence or absence of disease. Suppose that in a sample of n=100, 40 have the disease, and so our estimate of the proportion of disease in the population (which I will denote p) from the sample was drawn is \hat{p}=40/100=0.4.

Read more

Interpretation of frequentist confidence intervals and Bayesian credible intervals

This post was prompted by a tweet by Frank Harrell yesterday asking:

In this post I’ll say a little bit about trying to answer Frank’s question, and then a little bit about an alternative question which I posed in response, namely, how does the interpretation change if the interval is a Bayesian credible interval, rather than a frequentist confidence interval.

Read more

P-values after multiple imputation using mitools in R

I’ve been using Thomas Lumley’s excellent mitools package in R for applying Rubin’s rules for multiple imputation ever since I wrote the smcfcs package in R. Somebody recently asked me about how they could obtain p-values corresponding to the Rubin’s rules results calculated by the MIcombine function in mitools. In this short post I’ll give some R code to calculate these.

Read more