The Stats Geek

The mean of residuals in linear regression is always zero

March 23, 2020 by Jonathan Bartlett

In an introductory course on linear regression one learns about various diagnostics which might be used to assess whether the model is correctly specified. One of the assumptions of linear regression is that the errors have mean zero, conditional on the covariates. This implies that the unconditional or marginal mean of the errors have mean zero.

Independence of sample mean and sample variance

October 28, 2019 by Jonathan Bartlett

A student asked me a good question today about whether it is really the case that the sample mean and sample variance are independent random variables, given that both are functions of the same data. That this is true when the observations come from a normal distribution can be shown relatively easily – see here for example.

Is this independence true more generally? The answer is no. As a simple example, suppose that each of the observations come from a chi-squared distribution on one degree of freedom. This means that each of the observations is the square of an independent standard normal random variable. As such, their values are all positive. To see why the sample mean and sample variance are now dependent, suppose that the sample mean is small, and close to zero. Given that the observations are all positive, the only way the sample mean can be close to zero is if their variability around the sample mean is also small. That is, if the sample mean is small, we should expect the sample variance to also be small, and hence they are positively correlated random variables.

A quick simulation in R confirms this intuition:

set.seed(623423)
nSim <- 1000
n <- 10
ests <- array(0, dim=c(nSim,2))

for (i in 1:nSim) {

  #create sample from chi-squared on 1 d.f.
  x <- rnorm(n,mean=0,sd=1)^2
  #store sample mean and variance
  ests[i,] <- c(mean(x), var(x))

}

plot(ests[,1],ests[,2], xlab="Sample mean", ylab="Sample variance")

PhD in estimands/causal inference in trials (UK/EU)

October 4, 2019 by Jonathan Bartlett

If you are a UK/EU resident interested in pursuing a PhD on estimands/causal inference in clinical trials, please see the advert here. There is (rightly) increasing emphasis in clinical trials in clear specification of the scientific question and hence target estimand or parameter.

While one might think the process of choosing and specifying the estimand is usually easy, in many settings various things can happen during follow-up which complicate this. Examples include patients changing treatments, failing from competing risks, or dying before the endpoint of interest can be measured. This has led to the ICH E9 addendum on estimands, whose final version will soon be published. There remain a number of areas where deciding what the most appropriate estimand is and how one can validly estimate it from the observable data is challenging, and this PhD will seek to address some of these outstanding areas. For more background on this area, I’d recommend reading this paper.

The PhD will be based at the University of Bath, with myself as primary supervisor. The student will benefit from additional supervision from leading researchers in causal inference: Rhian Daniel (Cardiff), Jack Bowden (Bristol) and Daniel Farewell (Cardiff).

For information about funding and the application process, please see the information here. The application deadline is 25th November 2019.