My primary research area is that of missing data. Missing data are a common issue in empirical research. Within biostatistics missing data are almost ubiquitous – patients often do not come back to visits as planned, for a variety of reasons. In surveys participants may move in between survey waves, we lose contact with them, such that we are missing their responses to the questions we would have liked to asked them.
The miracle of the bootstrap
In my opinion one of the most useful tools in the statistician’s toolbox is the bootstrap. Let’s suppose that we want to estimate something slightly non-standard. We have written a program in our favourite statistical package to calculate the estimate. But in addition to the estimate itself, we need a measure of its precision, as given by its standard error. We saw in an earlier post how the standard error can be calculated for the sample mean. With a non-standard estimator, it may too difficult to derive an analytical expression for an estimate of the standard error. Or in some situations it may not be worth the intellectual effort of working out an analytical standard error.
Standard deviation versus standard error
A topic which many students of statistics find difficult is the difference between a standard deviation and a standard error.
The standard deviation is a measure of the variability of a random variable. For example, if we collect some data on incomes from a sample of 100 individuals, the sample standard deviation is an estimate of how much variability there is in incomes between individuals. Let’s suppose the average (mean) income in the sample is $100,000, and the (sample) standard deviation is $10,000. The standard deviation of $10,000 gives us an indication of how much, on average, incomes deviate from the mean of $100,000.