When you estimate a proportion and want to calculate a standard error for the estimate, you would normally do so based on assuming that the number of ‘successes’ in the sample is a draw from a binomial distribution, which counts the number of successes in a series of independent Bernoulli 0/1 draws, where each draw has a probability of ‘success’. Does the model rely or assume that for each of these binary observations the success probability is the same? In the third paragraph of this blog post Frank Harrell (seems to) argue that it does. In this post I’ll delve into this a bit further, using the same numerical example Frank gives.

Suppose we have a random sample of n individuals on whom we observe a binary outcome indicating presence or absence of disease. Suppose that in a sample of n=100, 40 have the disease, and so our estimate of the proportion of disease in the population (which I will denote ) from the sample was drawn is =40/100=0.4.