Linear mixed models are a popular modelling approach for longitudinal or repeated measures data. They extend standard linear regression models through the introduction of random effects and/or correlated residual errors. In the context of randomised trials which repeatedly measure patients over time, linear mixed models are a popular approach of analysis, not least because they handle missing data in the outcome ‘automatically’, under the missing at random assumption. Because of this a mixed model analysis has in many cases become the default method of analysis in clinical trials with a repeatedly measured outcome.
MMRM
Particularly within the pharmaceutical trials world, the term MMRM (mixed model repeated measures) is often used. Typically this model specifies no patient level random effects, but instead models the correlation within the repeated measures over time by specifying that the residual errors are correlated. In particular, to reduce the chances of model misspecification, commonly the residual errors are assumed to be from a multivariate normal distribution with a so called unstructured covariance matrix. This imposes no restriction on the form of the correlation matrix of the repeated measures.
For the so called ‘fixed effects’, one typically specifies effects of time (as a categorical or factor variable), randomised treatment group, and their interaction. This implies a saturated model for the mean, or put another way, there is a separate mean parameter for each time point in each treatment group. Often there are baseline covariates to be adjusted for. One can adjust for these as simple main effects, or additionally with an interaction with time, in order to allow for the association between the baseline variable(s) and outcome to potential vary over time. For a more in depth discussion of the model, see for example Molenberghs et al 2004 (open access).
My personal journey with statistical software started with Stata and SAS, with a little R. I thus first learnt how to fit such models in Stata and SAS, and only later in R. In this post I’m going to review how to fit the MMRM model to clinical data in all three packages, which may be of use to those who similarly switch between these software packages and need to fit such models. First, we’ll simulate a dataset in R which we will then analyse in each package.
Simulating a clinical trial dataset
To illustrate fitting the MMRM in the three packages, we will simulate a dataset with a continuous baseline covariate and three followup visits. We will introduce some (monotone) dropout, leading to missing data, which will satisfy the missing at random assumption. Specifically, we will simulate that some patients dropout before visit 1, dependent on their baseline covariate value. At each subsequent followup visit, dropout will be simulated among those still in the study dependent on the change in the outcome between the preceding visit and the visit before that. The following code simulates the data in R:
expit < function(x) {
exp(x)/(1+exp(x))
}
set.seed(1234)
n < 500
library(MASS)
#we will make correlation with baseline the same to visit 1 and visit 2
corr < matrix(1, nrow=4, ncol=4) + diag(0.5, nrow=4)
corr
data < mvrnorm(n, mu=c(0,0,0,0), Sigma=corr)
trt < 1*(runif(n)<0.5)
y0 < data[,1]
y1 < data[,2]
y2 < data[,3]
y3 < data[,4]
#add in effect of treatment
y1 < y1+trt*0.5
y2 < y2+trt*1
y3 < y3+trt*1.5
#now make some patients dropout before visit 1
#r1=1 indicates visit 1 observed
r1 < 1*(runif(n)<expit(3y0))
#dropout before visit 2, based on change from y0 to y1
r2 < 1*(runif(n)<expit(3(y1y0)))
r2[r1==0] < 0
#dropout before visit 3, based on change from y1 to y2
r3 < 1*(runif(n)<expit(3(y2y1)))
r3[r2==0] < 0
y1[r1==0] < NA
y2[r2==0] < NA
y3[r3==0] < NA
wideData < data.frame(id=1:n, trt=trt, y0=y0, y1=y1, y2=y2, y3=y3)
summary(wideData)
longData < reshape(wideData, varying=c("y1", "y2", "y3"),
direction="long", sep="", idvar="id")
longData < longData[order(longData$trt,longData$id,longData$time),]
#export data to CSV
write.csv(longData, file="mmrmdat.csv", row.names = FALSE, na=".")
MMRM in Stata
We can fit the MMRM in Stata using the mixed command. We first import the csv data into Stata:
import delimited using mmrmdat.csv, clear
The following code fits the model using REML (restricted maximum likelihood):
mixed y i.time##c.y0 i.time##i.trt  id: , nocons residuals(unstructured, t(time)) reml
The first part specifies that the variable y is our outcome and that we want interactions between time (as a categorical variable) and the continuous baseline covariate y0, and between time and treatment group. We then use the  notation to tell Stata that the id variable indicates the different patients. Observations from different id values are assumed independent.
The nocons option after this tells Stata not to include a random intercept term for patient, which it would include by default. Instead, as described above, we specify in the last part of the call that we want to model the residuals using an unstructured covariance matrix. In this specification we must tell Stata which variable indicates which position each observation is in, which in the case of longitudinal data corresponds to the time or visit variable. The last specification is to request REML rather than the default of maximum likelihood. Running this we obtain:
Mixedeffects REML regression Number of obs = 1,270
Group variable: id Number of groups = 458
Obs per group:
min = 1
avg = 2.8
max = 3
Wald chi2(8) = 967.88
Log restrictedlikelihood = 1583.45 Prob > chi2 = 0.0000

y  Coef. Std. Err. z P>z [95% Conf. Interval]
+
time 
2  .1381132 .0655449 2.11 0.035 .0096475 .2665789
3  .0846624 .0642857 1.32 0.188 .0413351 .21066

y0  .7505095 .0370394 20.26 0.000 .6779137 .8231053

time#c.y0 
2  .0030969 .0419305 0.07 0.941 .0790855 .0852792
3  .0860945 .0414578 2.08 0.038 .1673502 .0048387

1.trt  .4277844 .0836791 5.11 0.000 .2637764 .5917924

time#trt 
2 1  .337564 .0934798 3.61 0.000 .1543469 .520781
3 1  .8713972 .091424 9.53 0.000 .6922095 1.050585

_cons  .0747502 .0592104 1.26 0.207 .0413001 .1908004


Randomeffects Parameters  Estimate Std. Err. [95% Conf. Interval]
+
id: (empty) 
+
Residual: Unstructured 
var(e1)  .80163 .0531476 .7039468 .9128682
var(e2)  .8411992 .0583165 .7343261 .9636265
var(e3)  .7903076 .0573347 .6855573 .9110633
cov(e1,e2)  .3482899 .0437791 .2624845 .4340953
cov(e1,e3)  .3728598 .044828 .2849986 .4607209
cov(e2,e3)  .3081405 .0443681 .2211807 .3951003

LR test vs. linear model: chi2(5) = 190.51 Prob > chi2 = 0.0000
The inferences for the fixed effects are by default based on assuming the parameter estimates are normally distributed, which they are asymptotically. Since sometimes trials can have somewhat limited sample sizes, it is customary to use the modifications developed by Kenward and Roger, which makes adjustments to the standard errors and uses tdistributions for inference rather than zdistributions. We can do this by adding dfmethod(kroger):
mixed y i.time##c.y0 i.time##i.trt  id:, nocons residuals(unstructured, t(time)) reml dfmethod(kroger)
which gives the following output:
Mixedeffects REML regression Number of obs = 1,270
Group variable: id Number of groups = 458
Obs per group:
min = 1
avg = 2.8
max = 3
DF method: KenwardRoger DF: min = 411.26
avg = 435.56
max = 455.00
F(8, 647.70) = 120.25
Log restrictedlikelihood = 1583.45 Prob > F = 0.0000

y  Coef. Std. Err. t P>t [95% Conf. Interval]
+
time 
2  .1381132 .0655506 2.11 0.036 .0092758 .2669507
3  .0846624 .0643149 1.32 0.189 .0417645 .2110894

y0  .7505095 .0370394 20.26 0.000 .67772 .8232989

time#c.y0 
2  .0030969 .0419384 0.07 0.941 .0793268 .0855206
3  .0860945 .0414839 2.08 0.039 .1676361 .0045529

1.trt  .4277844 .0836791 5.11 0.000 .263339 .5922298

time#trt 
2 1  .337564 .0934918 3.61 0.000 .1538136 .5213144
3 1  .8713972 .091467 9.53 0.000 .6915985 1.051196

_cons  .0747502 .0592104 1.26 0.207 .0416096 .1911099


Randomeffects Parameters  Estimate Std. Err. [95% Conf. Interval]
+
id: (empty) 
+
Residual: Unstructured 
var(e1)  .80163 .0531476 .7039468 .9128682
var(e2)  .8411992 .0583165 .7343261 .9636265
var(e3)  .7903076 .0573347 .6855573 .9110633
cov(e1,e2)  .3482899 .0437791 .2624845 .4340953
cov(e1,e3)  .3728598 .044828 .2849986 .4607209
cov(e2,e3)  .3081405 .0443681 .2211807 .3951003

LR test vs. linear model: chi2(5) = 190.51 Prob > chi2 = 0.0000
In our case the KenwardRoger adjustments make relatively little difference, because our trial is moderately large. Lastly, we can sum the main effect of treatment with the interaction terms to obtain the estimated treatment effects at each of the three visits, with 95% CIs and pvalues:
. lincom 1.trt
( 1) [y]1.trt = 0

y  Coef. Std. Err. z P>z [95% Conf. Interval]
+
(1)  .4277844 .0836791 5.11 0.000 .2637764 .5917924

. lincom 1.trt+2.time#1.trt
( 1) [y]1.trt + [y]2.time#1.trt = 0

y  Coef. Std. Err. z P>z [95% Conf. Interval]
+
(1)  .7653484 .0884357 8.65 0.000 .5920177 .9386791

. lincom 1.trt+3.time#1.trt
( 1) [y]1.trt + [y]3.time#1.trt = 0

y  Coef. Std. Err. z P>z [95% Conf. Interval]
+
(1)  1.299182 .0887129 14.64 0.000 1.125308 1.473056

Interestingly we see that when we use lincom to estimate the treatment effects at each visit/time, Stata uses normal based inferences rather than tbased inferences. I am not using Stata very much these days, so am not as familiar with mixed as I used to be – there is almost certainly a way to respecify the model so that we can obtain the treatment effect estimates at each visit directly in the mixed output, using tbased inferences with the KenwardRoger method – if anyone can let me know I’d be grateful and will update the post.
MMRM in SAS
The MMRM can be fitted in SAS using PROC MIXED. After importing the csv file into SAS, we can fit the model using:
proc mixed data=work.longdata;
class trt time id;
model y = y0*time trt*time / SOLUTION DDFM=KenwardRoger;
repeated time / subject=id type=UN;
estimate 'visit 1 trt eff' trt*time 1 0 0
1 0 0 / cl;
estimate 'visit 2 trt eff' trt*time 0 1 0
0 1 0 / cl;
estimate 'visit 3 trt eff' trt*time 0 0 1
0 0 1 / cl;
run;
The model line specifies the fixed effects structure, that we would like SAS to print the estimates of the fixed effects parameters (SOLUTION) , and that we would like the Kenward Rogers modifications. The repeated line then specifies that we would like an unstructured residual covariance matrix, with subjects (patients) identified by the id variable, and the time variable indicating the position (visit/time) of the observation. The estimate lines then request the linear combinations that give us the estimated treatment effect at each of the three visits. Running this we obtain the output here. As we should expect, we obtain identical point estimates to Stata for the treatment effect at each visit. The standard errors differ slightly, which I think is because SAS is using the KenwardRoger SEs for the estimates/linear combinations, whereas as noted earlier, Stata seems to revert to normal based inferences when using lincom after mixed.
MMRM in R
Lastly, we fit the model in R. Linear mixed models are often fitted in R using the lme4 package, with the lmer function. This function however does not allow us to specify a residual covariance matrix which allows for dependency. We thus instead use the gls in the older nlme package (see bottom of post for update using MMRM). We can fit the model using:
library(nlme)
mmrm < gls(y~y0*factor(time)+trt*factor(time),
na.action=na.omit, data=longData,
correlation=nlme::corSymm(form=~time  id),
weights=nlme::varIdent(form=~1time))
summary(mmrm)
To specify the unstructured residual covariance matrix, we use the correlation and weights arguments. The corSymm correlation specifies an unstructured correlation matrix, with the time variable indicating the position and the id variable specifying unique patients. The varIdent weight argument then specifies that we want to allow a distinct variance for each followup visit. These two specifications together specify that we want an unstructured covariance matrix for the vector of repeated measures for each patient. Running the preceding code we obtain:
Generalized least squares fit by REML
Model: y ~ y0 * factor(time) + trt * factor(time)
Data: longData
AIC BIC logLik
3196.9 3273.995 1583.45
Correlation Structure: General
Formula: ~time  id
Parameter estimate(s):
Correlation:
1 2
2 0.424
3 0.468 0.378
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1  time
Parameter estimates:
1 2 3
1.000000 1.024376 0.992909
Coefficients:
Value Std.Error tvalue pvalue
(Intercept) 0.0747502 0.05921057 1.262446 0.2070
y0 0.7505095 0.03703947 20.262422 0.0000
factor(time)2 0.1381132 0.06554485 2.107155 0.0353
factor(time)3 0.0846631 0.06428595 1.316976 0.1881
trt 0.4277844 0.08367935 5.112186 0.0000
y0:factor(time)2 0.0030969 0.04193048 0.073858 0.9411
y0:factor(time)3 0.0860943 0.04145797 2.076665 0.0380
factor(time)2:trt 0.3375639 0.09347969 3.611093 0.0003
factor(time)3:trt 0.8713966 0.09142441 9.531333 0.0000
Correlation:
(Intr) y0 fct()2 fct()3 trt y0:()2 y0:()3 fc()2:
y0 0.101
factor(time)2 0.511 0.051
factor(time)3 0.493 0.050 0.411
trt 0.701 0.009 0.358 0.345
y0:factor(time)2 0.050 0.500 0.092 0.039 0.004
y0:factor(time)3 0.048 0.478 0.039 0.097 0.004 0.400
factor(time)2:trt 0.355 0.004 0.695 0.286 0.506 0.000 0.002
factor(time)3:trt 0.343 0.004 0.287 0.699 0.490 0.002 0.021 0.409
Standardized residuals:
Min Q1 Med Q3 Max
3.13100107 0.69487792 0.02639147 0.63569185 3.92499534
Residual standard error: 0.8953408
Degrees of freedom: 1270 total; 1261 residual
Comparing with the earlier output from Stata and SAS, we can see the estimates and standard errors are identical to the ones without KenwardRoger adjustments. To construct estimates and confidence intervals for the treatment effect at each visit, we can make use of the multcomp package as follows, constructing the linear combinations based on the coefficients in the model:
library(multcomp)
confint(glht(mmrm, linfct = matrix(c(0,0,0,0,1,0,0,0,0), nrow=1)))
confint(glht(mmrm, linfct = matrix(c(0,0,0,0,1,0,0,1,0), nrow=1)))
confint(glht(mmrm, linfct = matrix(c(0,0,0,0,1,0,0,0,1), nrow=1)))
which produces:
confint(glht(mmrm, linfct = matrix(c(0,0,0,0,1,0,0,0,0), nrow=1)))
Simultaneous Confidence Intervals
Fit: nlme::gls(model = y ~ y0 * factor(time) + trt * factor(time),
data = longData, correlation = nlme::corSymm(form = ~time 
id), weights = nlme::varIdent(form = ~1  time), na.action = na.omit)
Quantile = 1.96
95% familywise confidence level
Linear Hypotheses:
Estimate lwr upr
1 == 0 0.4278 0.2638 0.5918
> confint(glht(mmrm, linfct = matrix(c(0,0,0,0,1,0,0,1,0), nrow=1)))
Simultaneous Confidence Intervals
Fit: nlme::gls(model = y ~ y0 * factor(time) + trt * factor(time),
data = longData, correlation = nlme::corSymm(form = ~time 
id), weights = nlme::varIdent(form = ~1  time), na.action = na.omit)
Quantile = 1.96
95% familywise confidence level
Linear Hypotheses:
Estimate lwr upr
1 == 0 0.7653 0.5920 0.9387
> confint(glht(mmrm, linfct = matrix(c(0,0,0,0,1,0,0,0,1), nrow=1)))
Simultaneous Confidence Intervals
Fit: nlme::gls(model = y ~ y0 * factor(time) + trt * factor(time),
data = longData, correlation = nlme::corSymm(form = ~time 
id), weights = nlme::varIdent(form = ~1  time), na.action = na.omit)
Quantile = 1.96
95% familywise confidence level
Linear Hypotheses:
Estimate lwr upr
1 == 0 1.2992 1.1253 1.4731
As far as I am aware, although there are packages (e.g. pbkrtest) in R for calculating KenwardRoger degrees of freedom for mixed models fitted using lmer from the lme4 package, there aren’t any for the gls function in the nlme package. According to Søren Højsgaard, the pbkrtest package will have KenwardRoger functionality for gls added soon.
Other options
The mixed model / MMRM we have fitted here can obviously be modified in various ways. One aspect that could be modified is to relax the assumption that the covariance matrix is the same in the two treatment arms. This can be relaxed in Stata and SAS easily, but as far I ever been able to ascertain this is not possible to do using the glm function in nlme in R.
MMRM package in R
November 2022 update – a new package for fitting MMRMs in R has been released. See this post for more details.
Hi Jonathan,
Thanks for the nice post. Couple comments:
R code
At line `data < MASS::mvrnorm(n, mu=c(2,0,0,0,0), Sigma=corr)`, I think the argument `c(2,0,0,0,0)` contains an extra `0`, or is it the `2` is extra(?), so the code breaks. Either way, I can't seem to replicate the MMRM output in Stata. Simulating the dataset using `c(0,0,0,0)`, there are 1270 observations instead of your 988. Using `c(2,0,0,0)`, there are 975 observations. Could you clarify how the argument should be specified?
nocons
Regarding `id: , nocons`, it doesn't seem clear how the model does not estimate a random intercept a random id intercept is specified. The closest explanation I can find is that `mixed` doesn't actually estimate the random intecept for each person (ref: https://www.stata.com/statalist/archive/201307/msg00401.html). Instead, it estimates the variance of the intercepts. I don't follow why a random intercept should not be estimated (by stating the `nocons` option). Could you also help clarify this please?
Many thanks.
Hi Joanna. Many thanks for this.
R code – thanks for spotting this! I had been playing around with different versions of the data (with an extra baseline variable) and evidently didn’t copy and paste across the correct final R code for which the model results correspond. I have modified the code and all outputs – hopefully you should be able to get them to match, but please let me know if not.
nocons
The model we want to fit doesn’t include any patient level random effects, but instead models the dependency through allowing the residual errors to be correlated. To achieve this in Stata in mixed, we have to use the  id: form to tell Stata which variable observations are clustered by. By default Stata would then include a random intercept term, which we don’t want here. The nocons option in this position tells Stata not to include these. That they are not there can be seen in the model output in that in the first block ‘Randomeffects Parameters’ it says under id that it is empty. Instead, below this we can see the elements of estimated covariance matrix for the residual errors.
Best wishes, Jonathan.
Thanks Jonathan for the clarifications — the code works!
R code.
Only suggestion is to add `library(MASS)` at first line of script so R knows to load it.
nocons
I follow your explanation of what `nocons` does, but why would we NOT want a random intercept term?
I tried running the model with and without `nocons`: some estimates and 95% CI change in their 3rd and higher decimal places but the overall answer does not. Maybe it’s not a big deal to include or exclude the random intercept term(?)
Many thanks.
Thanks Joanna!
The idea is that we want to fit the most flexible/general multivariate normal model to reduce the possibility of model misspecification. The most general multivariate normal model assumes no particular structure for the variance/covariance matrix of the repeated observations, and this is what the unstructured residual covariance specification achieves. I am surprised that Stata will fit the model with a random intercept plus unstructured residual covariance matrix, as I would have thought it is not identifiable, since in terms of the covariance structure the unstructured model is already saturated / the most complex possible. Perhaps someone else can explain why Stata is still able to fit such a model.
Thanks Jonathan for the helpful explanation, appreciated.
A long while ago I looked at the R code for lme and gls to see if one could easily add KR style adjustments. I gave up seeing that effectively one needs to rewrite so much additional code and effectively rerun the whole model again. The reason is the parameterization of the covariance matrix. The KR approximation uses a Taylor series expansion based on the Covariance matrix itself, whereas R is using variances and correlations to parameterize. Perhaps there is some clever trick to get around this but I never found it in time. My hat off to those who manage it.
Perhaps a useful note is that the the adjusted values are invariant to reparameterization where the covariance matrix is intrinsically linear, or where the inverse of the covariance matrix is intrinsically linear (i.e. the covariance or its inverse can be expressed linearly even if they are not). This is identified in the second paper (the basis for KR2 in SAS and I think as used by Stata). But this invariance does require inclusion of the extra term accounting for potential bias in the mle of the covariance parameters.[Kenward & Roger, Computational Statistics and Data Analysis 53 (2009) 25832595]
James Roger.
Thanks for this James!
Hi Jonathan
Thanks a lot for summarizing this. The only option we have found to implement different covariance structures per group in R is via package glmmTMB which is more recent than nlme and also supports a range of other covariance structures (see here: https://cran.rproject.org/web/packages/glmmTMB/vignettes/covstruct.html). A trick to implement different covariance matrices per group is described here: https://stat.ethz.ch/pipermail/rsigmixedmodels/2020q4/029135.html
Unfortunately, as far as I can see, glmmTMB does also currently not support df adjustments.
Cheers,
Marcel
That’s great to know Marcel, thank you!
Hi Jonathan,
Happy New Year, and thanks for the nice MMRM post! We looked into R implementations last year and found a way to use lme4 and lmerTest together to fit an unstructured covariance matrix MMRM model. It is not perfect (since it has one variance parameter too much) but works very well usually and we can get Satterthwaite adjusted d.f. that match the SAS results.
See https://www.linkedin.com/pulse/mmrmrpresentedrpharmadanielsaban%25C3%25A9sbov%25C3%25A9/?trackingId=B1elol9kqrlPH5tLg3hy8Q%3D%3D for more details.
Best,
Daniel
And finally, the MMRM in R, calculated with GLS, is working!
https://openpharma.github.io/mmrm/
To get tbased estimates of treatment effect at T=2 and T=3 in Stata:
Refit model with ib2.time in place of i.time, and look at the 1.trt row
Refit model with ib3.time in place of i.time, and look at the 1.trt row
Hi, Thank you for this article. Do you know how to calculate the least squares mean for mmrm in R?
To Peter [April 28, 2021] Please check the emmeans package. It’s the continuation of the lsmeans, which gives you exactly what you need and works with a broad set of models, including of course those from nlme, lme4 and glmmTMB. It’s my daily workhorse in clinical trials with R. By the way, for the nlme::lme and gls it offers the Satterthwaite correction (both exact and iterative, if the default fails to compute), which is also of great interest in the pharmaceutical industry in case of small samples. The KR is supported natively for lme4 models too. emmeans is so capacious in features, so it takes some time to read it all and “embrace” 🙂 I strongly recommend checking it.
Thank you very much for this, Jonathan.
One trouble I’m having is matching the MMRM results in R with ANCOVA (i.e., using baseline and the last visit only) when there is no missing data. In that case, MMRM (with unstructured variancecovariance matrix) and ANCOVA should give identical results, but I’m unable to get a match. For ANCOVA, I used time=3 and compared it with results for ‘factor(time)3 : trt’ from MMRM. For MMRM I modified the code to
mmrm < gls(y~y0 + factor(time) + trt*factor(time),
na.action=na.omit, data=longData,
correlation=nlme::corSymm(form=~time  id),
weights=nlme::varIdent(form=~1time))
That is, I replaced y0*factor(time) with y0 + factor(time). I also tried it with y0*factor(time). In both cases, I couldn't get a match with ANCOVA.
Do you know why the two sets of results don't match?
Thanks,
Jitendra