Automatic convergence checking in Bayesian inference with runjags

I’ve been performing some simulation studies comparing a Bayesian to a more traditional frequentist estimation approach in a particular problem. To do this I’ve been using the excellent JAGS package, calling it from R. One of the issues I’ve faced is the question of how long to run the MCMC sampler in the Bayesian approach. Use too few iterations, and the chains will not have converged to their stationary distribution, such that the samples will not be from the posterior distribution of the model parameters. In regular data analysis situations, one can make use of the extensive diagnostic toolkit which has been developed over the years. The most popular of these is I believe to examine trace plots from multiple chains, started with dispersed initial values, and also Gelman and Rubin’s Rhat measure.

But when it comes to a simulation study, one clearly cannot examine trace plots for every simulated dataset! Fortunately, I have just come across the excellent runjags R package by Matthew Denwood, which interfaces with JAGS. As well as nice functionality for fitting your desired model and summarizing various aspects of the posterior distributions, the package has an extremely useful autorun.jags function. This fits your model, and runs enough iterations until the Gelman and Rubin statistic is less than 1.05 (by default) for all model parameters. It then runs additional iterations until the desired percentiles of the posterior distributions are estimated with sufficient (again with default choices) accuracy.

A further attractive feature is the ability to run multiple chains in parallel, i.e. exploiting the multi-core functionality of most modern computers, and thus dramatically reducing run times. This feature is available also in the R2jags package, but I personally had difficulties getting the parallel version of this working.

So for my simple example, the R code is:

modelFit <- autorun.jags(model="lin_reg_1.bug", monitor=jags.params, 
     data=jags.data, n.chains=5, startsample=4000, method="parallel", psrf.target=1.02)

which specifies that I want the sampler to be run with 5 chains, until Gelman and Rubin's Rhat is less than 1.02 for all model parameters. More detailed information about the runjags package can be found in this draft paper.

Leave a ReplyCancel reply