Linear regression is one the work horses of statistical analysis, permitting us to model how the expectation of an outcome Y depends on one or more predictors (or covariates, regressors, independent variables) X. Previously I wrote about the assumptions required for validity of ordinary linear regression estimates and their inferential procedures (tests, confidence intervals) assuming (as we often do) that the residuals are normally distributed with constant variance.

In my experience, when the ordinarily least squares (OLS) estimators and inferential procedures for them are taught to students, the predictors X are treated as fixed. That is, we act as if they are controlled by the design of our study or experiment. In practice, in lots of studies the predictors are not under the control of the investigator, but are random variables like the outcome Y. As an example, suppose we are interested in finding out how blood pressure (BP) depends on weight. We take a random sample of 1000 individuals, and measure their BP and weight. We then fit a linear regression model with Y=BP and X=weight. The predictor X here is certainly not under the control of the investigator, and so it seems to me a quite reasonable question to ask: why is it valid to use inferential procedures which treat X as fixed, when in my study X is random?

First, lets consider the unbiasedness of the OLS estimators. Recall that the OLS estimators can be written as

where the bold Y and X denote the vector and matrix of the outcome and predictors respectively from our sample of subjects. Then under the assumption that , the OLS estimator is easily shown to be conditionally (on ) unbiased, since:

But what if our predictors are random? Then we are interested in the unconditional expectation of the estimator. In this case we can simply appeal to the rule of total expectation to see that

What about the sampling variability of the OLS estimator? Assuming the residuals have constant variance , we can find its variance conditional on the observed values of the predictors by

In software, the variances of the OLS estimates are given using this formula, using the observed matrix and the sample estimate of the residual variance, . But what about the unconditional variance of the estimator. Using the law of total variance we can find that

The unconditional variance of the OLS estimator is therefore the average, across samples in which X and Y are random, of the variance the OLS estimator would have for fixed X. It follows that using the usual OLS variance estimator, derived assuming fixed X, is an unbiased estimate of the unconditional variance of the estimator.

I think your discussion focused much on the properties of estimator but not really the fundamental differences between fixed and random regressors. Here are some of my thoughts concerning this issue:

If we have stochastic regressors, we are drawing random pairs for a bunch of , the so-called random sample, from a fixed but unknown probabilistic distribution . Theoretically speaking, the random sample allows us to learn about or estimate some parameters of the distribution .

If we have fixed regressors, theoretically speaking, we can only infer certain parameters about conditional distributions, for where each is not a random variable, or is fixed. More specifically, stochastic regressors allow us to estimate some parameters of the entire distribution of while fixed regressors only let us estimate certain parameters of the conditional distributions .

The consequence is that fixed regressors cannot be generalized to the whole distribution. For example, if we only had in the sample as fixed regressors, we can not infer anything about 100 or 99.9, but stochastic regressors can.

Nevertheless, I am still unsure of the validity of my argument. Would you mind providing more insights on this topic?

Thanks Kun for your very thought provoking comment! If we make no assumptions about how f(Y|X) depends on X, then I agree we can only estimate the conditional distribution of Y at those values of X which are used in the sample. However, lets suppose we are interested in modelling E(Y|X), i.e. how the mean of Y varies with X. Next suppose we are willing to make the assumption that E(Y|X) is linear in X. In this case we can identify the unknown parameters in this conditional mean model so long as we have observations of Y at at least two distinct values of X. The linearity assumption thus enables us to draw inferences about E(Y|X) for all values of X, but obviously the validity of this relies on the linearity assumption.