Adjusting for optimism/overfitting in measures of predictive ability using bootstrapping

In a previous post we looked at the area under the ROC curve for assessing the discrimination ability of a fitted logistic regression model. An issue that we ignored there was that we used the same dataset to fit the model (estimate its parameters) and to assess its predictive ability.

A problem with doing this, particularly when the dataset used to fit/train the model is small is that such estimates of predictive ability are optimistic. That is, they will fit the dataset which have been used to estimate the parameters somewhat better than they will fit new data. In some sense, this is because with small datasets the fitted model adapts to chance characteristics of the observed data which won’t occur in future data. A silly example of this would be a linear regression model of a continuous variable Y fitted to a continuous covariate X with just n=2 data points. The fitted line will just be the line connecting the two data points. In this case, the R squared measure will be 1 (100%), suggesting your model has perfect predictive power(!), when of course with new data it would almost certainly not have an R squared of 1.

Read more

Wilcoxon-Mann-Whitney as an alternative to the t-test

The two sample t-test is one of the most used statistical procedures. Its purpose is to test the hypothesis that the means of two groups are the same. The test assumes that the variable in question is normally distributed in the two groups. When this assumption is in doubt, the non-parametric Wilcoxon-Mann-Whitney (or rank sum ) test is sometimes suggested as an alternative to the t-test (e.g. the Wikipedia page on the t-test), which doesn’t rely on distributional assumptions. But is this necessarily a good ‘replacement’?

Read more