When taking a course on likelihood based inference, one of the key topics is that of testing and confidence interval construction based on the likelihood function. Usually the Wald, likelihood ratio, and score tests are covered. In this post I'm going to revise the advantages and disadvantages of the Wald and likelihood ratio test. I will focus on confidence intervals rather than tests, because the deficiencies of the Wald approach are more transparently seen here.

## Adjusting for baseline covariates in randomized controlled trials

Randomized controlled trials constitute what are generally considered to be the gold standard design for evaluating the effects of some intervention or treatment of interest. The fact that participants are randomized to the two (sometimes more) groups ensures that, at least in expectation, the two treatment groups are balanced in respect of both measured, and importantly, unmeasured factors which may influence the outcome. As a consequence, differences in outcomes between the two groups can be attributed to the effect of being randomized to the treatment rather than the control (which often would be another treatment).

## R squared and goodness of fit in linear regression

I've been teaching a modelling course recently, and have been reading and thinking about the notion of goodness of fit. R squared, the proportion of variation in the outcome Y, explained by the covariates X, is commonly described as a measure of goodness of fit. This of course seems very reasonable, since R squared measures how close the observed Y values are to the predicted (fitted) values from the model.

## R squared and adjusted R squared

One quantity people often report when fitting linear regression models is the R squared value. This measures what proportion of the variation in the outcome Y can be explained by the covariates/predictors. If R squared is close to 1 (unusual in my line of work), it means that the covariates can jointly explain the variation in the outcome Y. This means Y can be accurately predicted (in some sense) using the covariates. Conversely, a low R squared means Y is poorly predicted by the covariates. Of course, an effect can be substantively important but not necessarily explain a large amount of variance - blood pressure affects the risk of cardiovascular disease, but it is not a strong enough predictor to explain a large amount of variation in outcomes. Put another way, knowing someone's blood pressure can't tell you with much certainty whether a particular individual will suffer from cardiovascular disease.

## The robust sandwich variance estimator for linear regression (theory)

In a previous post we looked at the properties of the ordinary least squares linear regression estimator when the covariates, as well as the outcome, are considered as random variables. In this post we'll look at the theory sandwich (sometimes called robust) variance estimator for linear regression. See this post for details on how to use the sandwich variance estimator in R.