The robust sandwich variance estimator for linear regression (theory)

In a previous post we looked at the properties of the ordinary least squares linear regression estimator when the covariates, as well as the outcome, are considered as random variables. In this post we'll look at the theory sandwich (sometimes called robust) variance estimator for linear regression. See this post for details on how to use the sandwich variance estimator in R.

In the earlier post we assumed that $E(Y|X)=X^{T}\beta$, where $\beta$ is a column vector of regression coefficients to be estimated.

Using estimating equation theory, we showed that the estimator has variance

$n^{-1}A(\beta^{*})^{-1}B(\beta^{*})(A(\beta^{*})^{-1})^{T}$

where $A(\beta)$ denotes the matrix is equal to minus the derivative of the estimating function with respect to the parameter $\beta$, $B(\beta)$ denotes the variance covariance matrix of the estimating function, and $\beta^{*}$ denotes the true value of $\beta$. We then found expressions for these population quantities, and estimators for them. We found that

$A(\beta)=E(-\frac{\partial}{\partial \beta} X(Y-X^{T}\beta)) = E(XX^{T})$

and that this could be estimated by its empirical mean

$\hat{E}(XX^{T})=n^{-1}\sum^{n}_{i=1}X_{i}X_{i}^{T}$

The matrix $B(\beta)$ was given as the variance of the estimating function

$B(\beta)=Var(X(Y-X^{T}\beta))=E(Var(\epsilon|X)XX^{T})$

In the previous post, we then derived an expression for this assuming that the residuals $\epsilon=Y-X^{T}\beta$ have constant variance (as the covariates vary). Here we will relax that assumption, such that $\epsilon$ may have a variance that varies with $X$. In this case, how might we estimate $B(\beta)$? Since by assumption the residuals have mean zero conditional on $X$, if subject $i$ has predictor values $X_{i}$, remembering that the variance is the average squared deviation around the mean, we can estimate $Var(\epsilon_{i}|X_{i})$ by the square of the estimated residual residual $\hat{\epsilon}_{i}=Y_{i}-X_{i}^{T}\hat{\beta}$, i.e.

$\hat{Var}(\epsilon_{i}|X_{i})=(Y_{i}-X_{i}^{T}\hat{\beta})^{2}$

The matrix $B(\beta)$ can then be estimated by taking the average of $X_{i}X_{i}^{T} \hat{Var}(\epsilon_{i}|X_{i})$ across the sample, substituting $\hat{\beta}$ in place of the unknown (true) value:

$\hat{B}(\hat{\beta}) = \sum^{n}_{i=1} X_{i}X_{i}^{T} \hat{Var}(\epsilon_{i}|X_{i})$

An alternative route to getting to the same estimator is to directly estimate $Var(X(Y-X^{T}\beta))$ by its sample variance, and using the fact that the this estimating function has mean zero.

The variance estimator we have derived here is consistent irrespective of whether the residuals in the regression model have constant variance. It is called the sandwich variance estimator because of its form in which the B matrix is sandwiched between the inverse of the A matrix.

When we suspect, or find evidence on the basis of a test for heteroscedascity, that the variance is not constant, the standard OLS variance should not be used since it gives biased estimate of precision. A by-product of this is that p-values for hypothesis tests and confidence intervals, which use the estimated variance, will not perform as they should - the type I error rate may not be correct, and the coverage rate of the confidence intervals will in general not meet their nominal level.

In this case, the sandwich estimator we have derived here can be used. This is sometimes called the robust estimator of variance, since it is robust to non-constant residual variance. An alternative to using the robust sandwich variance estimator would be to use bootstrapping.

For more details on the robust sandwich variance estimator, and semiparametric methods more generally, I recommend Tsiatis' book Semiparametric Theory and Missing Data.