The mean of the residuals in logistic regression is always zero

Last year I wrote a post about how in linear regression the mean (and sum) of the residuals always equals zero, and so checking that the overall mean of the residuals is zero tells you nothing about the goodness of fit of the model. Someone asked me recently whether the same is true of logistic regression. The answer is yes for a particular residual definition, as I’ll show in this post.

Suppose that we have data on n subjects. We fit a logistic regression model to binary outcome Y_{i} with covariates X_i=(X_{i1},..,X_{ip})^{T}. This specifies that

    \begin{eqnarray*}E(Y|X) = P(Y=1|X) = \mbox{expit}(\alpha + \beta^{T} X)\end{eqnarray*}

where \mbox{expit}(x)=\exp(x)/(1+\exp(x)). The maximum likelihood estimates of \alpha and \beta are the values solving the so called score equations – the derivative of the log likelihood function with respect to the parameters is set to zero. The likelihood function can be written here as:

    \begin{eqnarray*} L(\alpha,\beta) &=& \prod^{n}_{i=1} \mbox{expit}(\alpha + \beta^{T} X_{i})^{Y_{i}} (1-\mbox{expit}(\alpha + \beta^{T} X_{i}))^{1-Y_{i}} \\ &=& \prod^{n}_{i=1} \frac{\exp(\alpha+\beta^{T} X_{i})^{Y_{i}}}{(1+\exp(\alpha+\beta^{T} X_{i})^{Y_{i}}} \frac{1^{1-Y_{i}}}{(1+\exp(\alpha+\beta^{T} X_{i}))^{1-Y_{i}}} \\ &=&  \prod^{n}_{i=1} \frac{\exp(\alpha+\beta^{T} X_{i})^{Y_{i}}}{(1+\exp(\alpha+\beta^{T} X_{i}))} \end{eqnarray*}

so that the log likelihood function is given by:

    \begin{eqnarray*} l(\alpha,\beta) &=& \sum^{n}_{i=1} Y_{i} (\alpha+\beta^{T} X_{i}) - \log(1+\exp(\alpha+\beta^{T} X_{i}))  \end{eqnarray*}

If we differentiate with respect to \alpha we have:

    \begin{eqnarray*} \frac{\partial l(\alpha,\beta)}{\partial \alpha} &=& \sum^{n}_{i=1} Y_{i}  - \frac{\exp(\alpha+\beta^{T} X_{i})}{1+\exp(\alpha+\beta^{T} X_{i})} \\ &=& \sum^{n}_{i=1} Y_{i} - \mbox{expit}(\alpha+\beta^{T} X_{i}) \end{eqnarray*}

Setting this to zero the maximum likelihood estimates \hat{\alpha} and \hat{\beta} then satisfy:

    \begin{eqnarray*} 0 &=& \sum^{n}_{i=1} Y_{i} - \mbox{expit}(\hat{\alpha}+\hat{\beta}^{T} X_{i}) \end{eqnarray*}

If we define the residuals as \epsilon_{i} = Y_{i} - \mbox{expit}(\hat{\alpha}+\hat{\beta}^{T} X_{i}), which is the observed Y_{i} minus its predicted value, this means \sum^{n}_{i=1} \epsilon_{i} =0, i.e. the sum (and hence also mean) of the residuals is precisely zero.

Re-arranging this equation we can also see that

    \begin{eqnarray*} \sum^{n}_{i=1} Y_{i} = \sum^{n}_{i=1} \mbox{expit}(\hat{\alpha}+\hat{\beta}^{T} X_{i}) \end{eqnarray*}

or equivalently that

    \begin{eqnarray*} \frac{1}{n} \sum^{n}_{i=1} Y_{i} = \frac{1}{n} \sum^{n}_{i=1} \mbox{expit}(\hat{\alpha}+\hat{\beta}^{T} X_{i}) \end{eqnarray*}

This means that the average of the predicted probabilities of Y=1 from the model is exactly equal to the observed proportion of Y=1s in the sample used to fit the model. This property has been referred to as ‘calibration in the large’ in the risk prediction literature.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.