Running simulation studies in R

In my work and indeed blog posts on this site I often perform simulation studies. They can be invaluable in various ways for exploring and testing the performance of statistical methods under different conditions. Recently Tim Morris, Ian White and Michael Crowther published an excellent paper in Statistics in Medicine, freely available here, on how to plan and run simulation studies. The paper contains a wealth of useful guidance and advice on how to run simulation studies, and in particular highlights some things that can cause things to go wrong with inappropriate setting of random number seeds!

Tim has an accompanying Github repository with Stata code for their illustrative example from the paper, where they simulate survival data and analyse it using a number of different survival regression models. As part of the new MSc in Data Science & Statistics here at the University of Bath, I've put together a short introductory tutorial on performing simulation studies using R. It can be accessed here. I hope it gives a good introduction to the key elements of programming up a simulation study in R. If anyone has comments on it or thinks I've omitted something important that should be covered, please get in touch via email or a comment on this page.

Critical bug fix for smcfcs in Stata

At a recent missing data course run by a colleague, users of my multiple imputation program smcfcs in Stata 15.1 found that when imputing on a simulated dataset, smcfcs took much longer to run and issued many more rejection sampling warnings than those running using Stata 14.1. Moreover, the point estimates for the substantive/analysis model obtained by those using Stata 15.1 were dramatically different to those using Stata 14.1, with the former being very biased relative to the true parameter values.

Read moreCritical bug fix for smcfcs in Stata

Comment on 'Conditional estimation and inference to address observed covariate imbalance in randomized clinical trials'

Thanks to Tim Morris for letting me know about a paper just published in the journal Clinical Trials by Zhang et al, titled 'Conditional estimation and inference to address observed covariate imbalance in randomized clinical trials'. Zhang et al propose so called conditional estimation and inference to address observed covariate imbalance in randomised trials. They introduce the setup of randomised trials with covariates  X , randomised treatment  T , and outcome Y. They begin with a framework that treats all three as random in repeated sampling, and review the unadjusted estimator of the marginal mean difference in outcome, and a covariate adjusted estimator based on earlier work by Tsiatis and others.

Read moreComment on 'Conditional estimation and inference to address observed covariate imbalance in randomized clinical trials'

Combining bootstrapping and multiple imputation under uncongeniality

Tomorrow I'm giving a talk (slides here) at the Joint Statistical Meeting in Vancouver on some work I've been doing on combining bootstrapping with multiple imputation (MI), something I've written about here before. That post looked at a recent paper by Schomaker and Heumann (2018) on various ways of combining bootstrapping and MI. A more recent post discussed an arXiv paper by von Hippel (2018) on maximum likelihood multiple imputation, which also contains a nice proposal for combining bootstrap and MI. My talk this week is about how these perform when the imputation and analysis models are not congenial.

Read moreCombining bootstrapping and multiple imputation under uncongeniality

Missing not at random sensitivity analysis with FCS multiple imputation

Daniel Tompsett and colleagues have recently published a paper (open access here) on performing missing not at random (MNAR) sensitivity analyses within the fully conditional specification (FCS) framework for multiple imputation (MI). A number of previous papers had explored versions of the approach, and Tompsett et al bring these together to formalise the basis for the approach (which they term NARFCS) and importantly how to choose values of the sensitivity parameters involved.

Read moreMissing not at random sensitivity analysis with FCS multiple imputation

Multiple imputation when estimating relative risks

Sullivan and colleagues have recently published a nice paper exploring multiple imputation for missing covariates or outcome when one is interested in estimating relative risks. They performed simulations where missing covariates or outcomes were imputed either using multivariate normal imputation or using fully conditional specification imputation (FCS), and where the true outcome model is a log link binomial model. They concluded that multivariate normal imputation performed poorly, producing estimated coefficients which were biased towards the null. Fully conditional specification performed better, although estimates were still biased in certain situations.

Read moreMultiple imputation when estimating relative risks