# Longitudinal and clustered data analysis books

The following are the books on longitudinal data analysis that I have found most useful.

Linear Mixed Models for Longitudinal Data, by Verbeke and Molenberghs, 2000
At least in my mind, this is the classic text on longitudinal data analysis, and it is probably the book I've learnt most from regarding longitudinal data analysis. Particular highlights include:

• Inference methods for the variance component parameters, and in particular likelihood ratio tests at the boundary of the parameter space
• Discusses how one can assess the suitability of normality assumptions
• Explaining how SAS PROC MIXED can be used to fit the majority of the models described
• Explains which parts of the model can be consistently estimated even when normality assumptions are violated, by linking maximum likelihood estimators to GEEs

Published 6 years later than Verbeke and Molenbergh's book on continuous outcomes, this book covers methods for discrete longitudinal data. Analysis for discrete or categorical outcomes turns out in a number of ways to be more complicated than for linear mixed models for continuous outcomes. This book naturally covers the two main approaches for analysing discrete data, namely the marginal GEE approach and the subject specific random effects approach. Transition models, where one models the distribution of the current time point conditional on the past is also covered.

Another of the CRC Press' handbooks, this handbook of longitudinal data analysis probably offers the most up to date review of methodologies for longitudinal data analysis. Parts 1 and 2 cover what might be considered the mainstream material, namely parametric random effects models and semiparametric restricted mean (GEE) marginal models. A whole chapter is dedicated to discussing the different possible targets of inference, a topic I wrote about recently.

Part 3 of the book is dedicated to nonparametric and semiparametric modelling, and in particular covers spline models. Part 4 covers joint models where one models the joint evolution of more than one longitudinal process over time. A chapter is also included on the now more popular joint modelling of longitudinal and time to event data.

The final part covers issues arising due to missing or incomplete data. Chapters covers multiple imputation and inverse probability weighting methods. Further chapters cover MNAR sensitivity analysis approaches. The final chapter discusses causal inference methodology from Jamie Robins for estimating the effects of time varying exposures.