The Stats Geek

Checking functional form in logistic regression using loess plots

September 13, 2014 by Jonathan Bartlett

When we include a continuous variable as a covariate in a regression model, it’s important that we include it using the correct (or something approximately correct) functional form. For example, with a continuous outcome Y and continuous covariate X, it may be the case that the expected value of Y is a linear function of X and X^2, rather than a linear function of X. For linear regression there are a number of ways of assessing what the appropriate functional form is for a covariate. A simple but often effective approach is simply to look at a scatter plot of Y against X, to visually assess the shape of the association.

Using Stata’s sem to adjust for covariate measurement error

September 12, 2024September 7, 2014 by Jonathan Bartlett

Covariate measurement error is a common issue in epidemiology. Many statistical methods have been developed for allowing for covariate measurement error over the last three decades or so. I’ve been playing around with Stata’s structural equation modelling builder, which enables one to allow for covariate measurement error using maximum likelihood for estimation. I’m still very much a beginner with structural equation models and Stata’s implementation of them, but hopefully this YouTube video is a useful illustration of just one of the things that’s possible with them.

Robustness of linear mixed models

January 3, 2015August 17, 2014 by Jonathan Bartlett

Linear mixed models form an extremely flexible class of models for modelling continuous outcomes where data are collected longitudinally, are clustered, or more generally have some sort of dependency structure between observations. They involve modelling outcomes using a combination of so called fixed effects and random effects. Random effects allow for the possibility that one or more covariates have effects that vary from unit (cluster, subject) to unit. In the context of modelling longitudinal repeated measures data, popular linear mixed models include the random-intercepts and random-slopes models, which respectively allow each unit to have their own intercept or (intercept and) slope.

As implemented in statistical packages, linear mixed models assume that we have modelled the dependency structure correctly, and that both the random effects and within-unit residual errors follow normal distributions, and that these have constant variance. While it is possible to some extent to check these assumptions through various diagnostics, a natural concern is that if one or more assumptions do not hold, our inferences may be invalid. Fortunately it turns out that linear mixed models are robust to violations of some of their assumptions.