A student asked me a good question today about whether it is really the case that the sample mean and sample variance are independent random variables, given that both are functions of the same data. That this is true when the observations come from a normal distribution can be shown relatively easily – see here for example.
Is this independence true more generally? The answer is no. As a simple example, suppose that each of the observations come from a chi-squared distribution on one degree of freedom. This means that each of the observations is the square of an independent standard normal random variable. As such, their values are all positive. To see why the sample mean and sample variance are now dependent, suppose that the sample mean is small, and close to zero. Given that the observations are all positive, the only way the sample mean can be close to zero is if their variability around the sample mean is also small. That is, if the sample mean is small, we should expect the sample variance to also be small, and hence they are positively correlated random variables.
A quick simulation in R confirms this intuition:
set.seed(623423)
nSim <- 1000
n <- 10
ests <- array(0, dim=c(nSim,2))
for (i in 1:nSim) {
#create sample from chi-squared on 1 d.f.
x <- rnorm(n,mean=0,sd=1)^2
#store sample mean and variance
ests[i,] <- c(mean(x), var(x))
}
plot(ests[,1],ests[,2], xlab="Sample mean", ylab="Sample variance")
Very interesting! I noticed that even small departures from normality induce dependence.
set.seed(623423)
# https://r-forge.r-project.org/R/?group_id=2149
library(twopiece)
nSim <- 10000
n <- 100
ests <- array(0, dim=c(nSim,2))
for (i in 1:nSim) {
#create sample from a twopiece normal distribution
x <- rtp3(n = n, mu = 0, par1 = 1, par2 = 1.5, FUN = rnorm)
#store sample mean and variance
ests[i,] <- c(mean(x), var(x))
}
plot(ests[,1],ests[,2], xlab="Sample mean", ylab="Sample variance")
# Checking the slope
summary(lm(ests[,2] ~ ests[,1]))
Very interesting and nice post. Thanks Jonathan.
Thank you very much for this awesome tutorial, is there a rigorous mathematical proof I can check?