Interpretation of frequentist confidence intervals and Bayesian credible intervals

This post was prompted by a tweet by Frank Harrell yesterday asking:

In this post I’ll say a little bit about trying to answer Frank’s question, and then a little bit about an alternative question which I posed in response, namely, how does the interpretation change if the interval is a Bayesian credible interval, rather than a frequentist confidence interval.

Read more

P-values after multiple imputation using mitools in R

I’ve been using Thomas Lumley’s excellent mitools package in R for applying Rubin’s rules for multiple imputation ever since I wrote the smcfcs package in R. Somebody recently asked me about how they could obtain p-values corresponding to the Rubin’s rules results calculated by the MIcombine function in mitools. In this short post I’ll give some R code to calculate these.

Read more

Independence of sample mean and sample variance

A student asked me a good question today about whether it is really the case that the sample mean and sample variance are independent random variables, given that both are functions of the same data. That this is true when the observations come from a normal distribution can be shown relatively easily – see here for example.

Is this independence true more generally? The answer is no. As a simple example, suppose that each of the observations come from a chi-squared distribution on one degree of freedom. This means that each of the observations is the square of an independent standard normal random variable. As such, their values are all positive. To see why the sample mean and sample variance are now dependent, suppose that the sample mean is small, and close to zero. Given that the observations are all positive, the only way the sample mean can be close to zero is if their variability around the sample mean is also small. That is, if the sample mean is small, we should expect the sample variance to also be small, and hence they are positively correlated random variables.

A quick simulation in R confirms this intuition:

set.seed(623423)
nSim <- 1000
n <- 10
ests <- array(0, dim=c(nSim,2))

for (i in 1:nSim) {

  #create sample from chi-squared on 1 d.f.
  x <- rnorm(n,mean=0,sd=1)^2
  #store sample mean and variance
  ests[i,] <- c(mean(x), var(x))

}

plot(ests[,1],ests[,2], xlab="Sample mean", ylab="Sample variance")