I've recently been working on some simulation studies in R which involve computer intensive MCMC sampling. Ordinarily I would use my institution's computing cluster to do these, making use of the large number of computer cores, but a temporary lack of availability of this led me to investigate using Amazon's Web Services (AWS) system instead. In this post I'll describe the steps I went through to get my simulations going in R. As background, I am mainly a Windows user, and had never really used the Linux operating system. Nonetheless, the process wasn't actually too tricky to get going in the end, and it's enabled me to get the simulations completed far far more quickly than if I'd just used my desktop's 8 cores. The advantages of using a cloud computing resource (from my perspective) is that in principle you can use as little or as much computing power as you need or want, and it is always available - you don't have to compete against other user's demands, as would typically be the case on an academic institution's computer cluster.
In the process of organising a conference session on machine learning, I've finally got around to reading the late Leo Breiman's thought provoking 2001 Statistical Science article "Statistical Modeling: The Two Cultures". I highly recommend reading the paper, and the discussion that follows it. In the paper Breiman argues that statistics as a field should open its eyes to analysing data not only with traditional 'data models' (his terminology), by which he means standard (usually parametric) probabilistic models, but to also make much more use of so called machine learning algorithmic techniques.
Yesterday we were very pleased to welcome Prof. Alan Agresti, from the University of Florida, to give a departmental seminar at the London School of Hygiene & Tropical Medicine. Prof. Agresti gave a very interesting seminar, covering a wide range of topics, which came out of writing his most recent book - Foundations of Linear and Generalized Linear Models. Among these was some of the problems that can arise when ordinal outcomes are modelled using linear regression models. He then discussed a proposal for a new way of representing covariate effects in ordinal regression models, in a so called superiority measure. Next he discussed some of the problems that can arise with Wald based inferences, a topic I've touched upon before here. Prof. Agresti then discussed some issues with residuals in GLMs, and some recently new methods for modelling multivariate using generalized estimating equations.
An audio/slide recording of Prof. Agresti's seminar is available here.
A friend of mine has been telling me for some time that I should try using Git / GitHub to keep track of my files. In this post I'll give a bit of an overview of these (or at least my understanding of them so far!), and some of my experiences of using them so far. Some of the terminology is (in my experience) a bit confusing, so I'll attempt to give an intuitive introduction. For any Git experts out there, please put me straight (in a comment) if I've got something wrong!
From a tweet I just came across the following article at the UK's NHS Choices website. It raises doubts about the predictive value of a new test for Alzheimer's disease, published in a paper here. The model aims to predict whether those suffering from mild cognitive impairment will progress to Alzheimer's disease (AD) in the following year.