This week I was talking to a friend about how covariates which have missing values are handled in structural equation modelling (SEM) software. I’ll preface this post by saying that I’m definitely not an expert (or anywhere close!) in structural equation models, so if anyone spots errors/problems please add a comment. My friend thought that certain implementations of SEMs in some packages have the ability to automatically accommodate missingness in covariates, using so called ‘full information maximum likelihood’. In the following I’ll describe my subsequent exploration of how Stata’s sem command handles missingness in covariates.
Comparing predictive ability of two nested logistic regression models
A very common situation in biostatistics, but also much more broadly of course, is that one wants to compare the predictive ability of two competing models. A key question of interest often is whether adding a new marker or variable Y to an existing set X improves prediction. The most obvious way of testing this hypothesis is to use a regression model, and then test whether adding the new variable Y improves fit, by testing the null hypothesis that the coefficient of Y in the expanded model differs from zero. An alternative approach is to test whether adding the new variable improves some measure of predictive ability, such as the area under the ROC curve.
Stata-Mata’s st_view function – use with care!
I use Stata a lot, and I think it’s a great package. An excellent addition a few years ago was the Mata language, a fully fledged matrix programming language which sits on top or separate from Stata’s regular dataset and command/syntax structure. Many of Stata’s built in commands are programmed using Mata, I believe. I’ve been using Mata quite a bit to program new commands, and in the process have come across some strange behaviour in the st_view function in Mata which I think can cause real difficulties (it did for me!). This post will hopefully help avoid others ending up with the problems I did.