Previously I wrote about how when linear regression is introduced and derived, it is almost always done assuming the covariates/regressors/independent variables are fixed quantities. As I wrote, in many studies such an assumption does not match reality, in that both the regressors and outcome in the regression are realised values of random variables. I showed that the usual ordinary least squares (OLS) estimators are unbiased with random covariates, and that the usual standard error estimator, derived assuming fixed covariates, is unbiased with random covariates. This gives us some understand of the behaviour of these estimators in the random covariate setting.
Here I'll take a different approach, and appeal to the powerful theory of estimating equations. It turns out that many of the statistical estimators we use can be expressed as being the solutions to a set of estimating equations. The theory is powerful because it allows us to derive the asymptotic (large sample) behaviour of the estimators, and also gives us a consistent estimator of variance (of the parameter estimator), enabling us to find standard errors and confidence intervals. An excellent article introducing this theory, by Stefanski and Boos, can be found here. For further details, I'd highly recommend Tsiatis' book Semiparametric Theory and Missing Data, which covers estimating equation theory in semiparametric models.
To recall the linear regression, suppose we have, for each subject, an outcome Y and vector of predictors X. To keep the derivations simple, I will assume the first component of X includes 1, representing a constant intercept. The fundamental modelling assumption is then that , where is a column vector of regression coefficients to be estimated. The OLS estimator of is usually expressed as:
where bold and respectively denote the vector and matrix containing the values of Y and X for all n subjects in the sample. It is easy to show that the OLS estimator is also given by the value of which solves the following estimating equation:
where and denote the values of Y and X for the ith subject.
Now we can begin to start applying the theory of estimating equations. First, under certain regularity conditions, the theory tells that the distribution of the estimator converges to that of a (multivariate) normal distribution as the sample size n tends to infinity. Importantly, this holds irrespective of whether the residuals are normally distributed or not, and also irrespective of whether they have constant variance.
Next, theory says that the estimator will be consistent if the estimating function, which here is , has expectation zero when evaluated at the true value of . Consistency means that for large sample sizes, the estimator will be close to the true population parameter value with high probability. To check that this condition holds for the OLS estimators, we must find the expectation . To do this, we make use of the law of total expectation:
If the model is correctly specified, , and so . The estimating function thus has mean zero. We can therefore conclude the OLS estimator is consistent for .
We now turn to the variance of the estimator. Without for the moment making any further assumptions other than that , estimating equation theory says that with a sample size of n, the estimator has variance
where denotes the matrix is equal to minus the derivative of the estimating function with respect to the parameter , denotes the variance covariance matrix of the estimating function, and denotes the true value of . First we find the second of these matrices, using the law of total variance
Since , the second term here is zero. If we write , we have
In this post, let's suppose the residuals have constant variance (I'll come back to the non-constant variance case in a later post), so that . Then
Turning now to the matrix , taking minus the derivative (with respect to ) of the estimating function, we have
Together, we thus have that the variance of the OLS estimator is equal to
The matrix is the population (true) expectation of the product of the vector X with its transpose. Similarly, denotes the population residual variance. To estimate the variance in practice, we can use the expression in the previous equation, replacing by its usual sample estimate, . The matrix can be estimated by its empirical (sample) mean
The variance of the OLS estimator can thus be estimated by
With a little bit of manipulation (which I won't show here), we can see that this is identical to the variance estimator used in OLS implementations, i.e.
We have thus shown that the usual OLS variance estimator, derived assuming the covariates are fixed, is a consistent estimator of the variance of OLS in repeated sampling in which the covariates are random.
In a future post, I'll look at how the preceding derivations can be extended to the case where we relax the assumption that the residuals have constant variance.
A simple semiparametric model
A final note. If the model only consists of the assumptions that and that the residuals have constant variance, the model is termed semiparametric. This is because although we have specified a certain aspect of the distribution of the observable random variables (specifically, how the mean of Y varies as a function of X, and that the residual variance is constant), all other aspects of the distribution are left arbitrary. The parametric component corresponds to the finite dimensional parameters and , whilst the non-parametric component corresponds to all the other aspects of the joint distribution of Y and X which we have left arbitrary.