Someone recently asked me what the difference was between the sample mean and the population mean. This is really a question which goes to the heart of what it means to perform statistical inference. Whatever field we are working in, we are usually interested in answering some kind of question, and often this can be expressed in terms of some numerical quantity, e.g. what is the mean income in the US. This question can be framed mathematically by saying we would like to know the value of a parameter describing some distribution. In the case of the mean US income, the parameter is the mean of the distribution of US incomes. Here the population is the US population, and the *population mean* is the mean of all the incomes in the US population. For our objective, the population mean is the *parameter* of interest.

# Jonathan Bartlett

## When is complete case analysis unbiased?

My primary research area is that of missing data. Missing data are a common issue in empirical research. Within biostatistics missing data are almost ubiquitous – patients often do not come back to visits as planned, for a variety of reasons. In surveys participants may move in between survey waves, we lose contact with them, such that we are missing their responses to the questions we would have liked to asked them.

## The miracle of the bootstrap

In my opinion one of the most useful tools in the statistician’s toolbox is the bootstrap. Let’s suppose that we want to estimate something slightly non-standard. We have written a program in our favourite statistical package to calculate the estimate. But in addition to the estimate itself, we need a measure of its precision, as given by its standard error. We saw in an earlier post how the standard error can be calculated for the sample mean. With a non-standard estimator, it may too difficult to derive an analytical expression for an estimate of the standard error. Or in some situations it may not be worth the intellectual effort of working out an analytical standard error.