Independence of sample mean and sample variance

A student asked me a good question today about whether it is really the case that the sample mean and sample variance are independent random variables, given that both are functions of the same data. That this is true when the observations come from a normal distribution can be shown relatively easily – see here for example.

Is this independence true more generally? The answer is no. As a simple example, suppose that each of the observations come from a chi-squared distribution on one degree of freedom. This means that each of the observations is the square of an independent standard normal random variable. As such, their values are all positive. To see why the sample mean and sample variance are now dependent, suppose that the sample mean is small, and close to zero. Given that the observations are all positive, the only way the sample mean can be close to zero is if their variability around the sample mean is also small. That is, if the sample mean is small, we should expect the sample variance to also be small, and hence they are positively correlated random variables.

A quick simulation in R confirms this intuition:

set.seed(623423)
nSim <- 1000
n <- 10
ests <- array(0, dim=c(nSim,2))

for (i in 1:nSim) {

  #create sample from chi-squared on 1 d.f.
  x <- rnorm(n,mean=0,sd=1)^2
  #store sample mean and variance
  ests[i,] <- c(mean(x), var(x))

}

plot(ests[,1],ests[,2], xlab="Sample mean", ylab="Sample variance")

3 thoughts on “Independence of sample mean and sample variance”

  1. Very interesting! I noticed that even small departures from normality induce dependence.

    set.seed(623423)
    # https://r-forge.r-project.org/R/?group_id=2149
    library(twopiece)
    nSim <- 10000
    n <- 100
    ests <- array(0, dim=c(nSim,2))

    for (i in 1:nSim) {

    #create sample from a twopiece normal distribution
    x <- rtp3(n = n, mu = 0, par1 = 1, par2 = 1.5, FUN = rnorm)
    #store sample mean and variance
    ests[i,] <- c(mean(x), var(x))

    }

    plot(ests[,1],ests[,2], xlab="Sample mean", ylab="Sample variance")
    # Checking the slope
    summary(lm(ests[,2] ~ ests[,1]))

    Reply

Leave a Reply to Mostafa Ismail AtallahCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.