'How to manually calculate the residual and random effects variance in a mixed-model?

I have the following mixed-model:

mod<-lmer(NO.TRANS~SEX+(1|ID),REML=F)

when I check the summary, I get:

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: NO.TRANS ~ SEX + (1 | ID)

     AIC      BIC   logLik deviance df.resid 
  1530.8   1542.8   -761.4   1522.8      142 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.5496 -0.6141  0.0230  0.5405  4.7476 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 1208     34.75   
 Residual             1111     33.33   
Number of obs: 146, groups:  ID, 74

Fixed effects:
            Estimate Std. Error t value
(Intercept)   72.869      7.122  10.231
SEXM           6.842      9.809   0.698

Correlation of Fixed Effects:
     (Intr)
SEXM -0.726

As one can see, the variance of the random effect (intercepts) is 1208. And the variance of the residuals is 1111.

After some research, I know the right way to calculate those is not var(resid(mod)) or var(coef(mod)$ID[,1]). However, I don't know what are the right way to calculate them. I know that VarCorr() can give those values, but I need to manually calculate them. And can't find the answer to this anywhere online or in my books. Can someone please help me?

This is the data:

    ID SEX NO.TRANS
1    215   F       20
2    243   F       94
3    354   F      146
4    535   F      111
5    543   F       43
6   1025   F       72
7   1304   F       37
8   1305   F       35
9   1350   F       73
10  1502   F      117
11  1530   F       87
12  2205   F       22
13  2550   F       35
14  3024   F       34
15  3035   F      361
16  3042   F      120
17  3051   F       62
18  3302   F      109
19  3330   F       90
20  3504   F       38
21  3540   F       87
22  4013   F       41
23  4023   F        8
24  4025   F      157
25  4202   F      121
26  4205   F       59
27  4301   F       53
28  4304   F       66
29  4305   F        0
30  4410   F      109
31  5043   F       69
32  5203   F       84
33  5301   F       75
34  5520   F       93
35   135   M        6
36   225   M       53
37   251   M       65
38   425   M       86
39   452   M       60
40   534   M       78
41  1045   M       70
42  1055   M      103
43  1111   M       68
44  1404   M        0
45  2015   M       83
46  2033   M       92
47  2405   M       59
48  2440   M       43
49  2501   M       70
50  2503   M       99
51  3015   M      106
52  3023   M        0
53  3044   M      104
54  3150   M      101
55  3204   M      202
56  3305   M       79
57  3401   M      164
58  3403   M       68
59  3420   M        0
60  4041   M       17
61  4053   M      117
62  4104   M      157
63  4105   M      185
64  4302   M       72
65  5015   M       89
66  5025   M       86
67  5032   M       84
68  5042   M       94
69  5105   M      121
70  5250   M       67
71  5302   M       74
72  5403   M       45
73   215   F        0
74   243   F      107
75   354   F      131
76   535   F       87
77   543   F       83
78  1025   F       18
79  1304   F       21
80  1305   F       42
81  1350   F       49
82  1502   F       73
83  1530   F      114
84  2205   F       75
85  2550   F       90
86  3024   F       10
87  3035   F      164
88  3042   F       71
89  3051   F      109
90  3302   F       45
91  3330   F       20
92  3504   F       83
93  3540   F       63
94  4013   F        0
95  4023   F       53
96  4025   F      149
97  4035   F      117
98  4202   F       55
99  4205   F       78
100 4301   F        0
101 4304   F       28
102 4305   F       55
103 4410   F      107
104 5043   F       84
105 5203   F       39
106 5301   F        7
107 5520   F       50
108  135   M       89
109  225   M       56
110  251   M       26
111  342   M       62
112  425   M       68
113  452   M       92
114  534   M       37
115 1045   M       78
116 1055   M       99
117 1111   M       88
118 1404   M        0
119 2015   M      101
120 2033   M       24
121 2405   M        0
122 2440   M       72
123 2501   M      104
124 2503   M      113
125 3015   M      128
126 3023   M       75
127 3044   M       82
128 3150   M       92
129 3204   M      126
130 3305   M      103
131 3401   M      163
132 3403   M       26
133 3420   M       34
134 4041   M       85
135 4053   M      159
136 4104   M      167
137 4105   M      103
138 4302   M       69
139 5015   M       51
140 5025   M       41
141 5032   M       53
142 5042   M       47
143 5105   M       80
144 5250   M       67
145 5302   M      126
146 5403   M       94

Thanks in advance!



Solution 1:[1]

I may have misunderstood your question: I thought "how do I manually calculate ...?" meant "how do I calculate these values directly from the data without using a mixed model package?", whereas I now think you meant "how do I extract these values from the fitted model?"

Values to get CIs of (very similar to examples in ?lme4::bootMer)

f1 <- function(fitted) {
  c(re_var = c(VarCorr(fitted)[[1]]),  ## RE variance
    resid_var  = sigma(fitted)^2)      ## residual variance
}

parametric bootstrap

With lme4::confint.merMod:

bb <- confint(mod,
              method = "boot",
              nsim = 1000,
              ## cosmetic:
              .progress = "txt",
              PBargs = list(style = 3),
             
              FUN = function(fitted) {
                c(re_var = c(VarCorr(fitted)[[1]]),
                  resid_var  = sigma(fitted)^2)
              })

Results:

             2.5 %   97.5 %
re_var    551.6002 1804.407
resid_var 795.4443 1495.629

nonparametric bootstrap

Nonparametric bootstrapping for mixed models is a little bit tricky;

  • you definitely shouldn't resample individual observations independently (this will mess up the grouping structure of the data), but you may have to make more choices about whether you bootstrap within as well as between blocks (i.e. if you resample a block, should you also resample the observations within blocks?)

  • if you have more than one grouping variable in your data (you don't) you have to decide which blocks to resample;

  • if you have crossed random effects then it gets very hard to resample independent blocks ...

  • the glmmboot package does block bootstrapping, but it looks like it only returns information about the fixed effects ...

library(lmeresampler)
system.time(
    bb3 <- bootstrap(mod, .f = f1, type ="reb", reb_type = 0, B=1000)
)  ## 21 seconds
## reb = "random effects block bootstrap"
## read the vignette("lmeresampler") for details on reb_type

confint(bb3, type = "perc")
# A tibble: 2 × 6
  term      estimate lower upper type  level
  <chr>        <dbl> <dbl> <dbl> <chr> <dbl>
1 re_var       1208.  901. 2894. perc   0.95
2 resid_var    1111.  352.  805. perc   0.95

I'm a little worried that the reported CIs are so different (e.g. 550-1800 (nonparametric) vs 901-2894 (compare with cc <- confint(mod, parm = "theta_", oldNames = FALSE); cc^2, which gives profile CIs; these are closer to the parametric bootstrap estimates).

A brief glance at plot(mod, id = 0.05, idLabels = ~.obs) shows that observation 15 is an outlier - observation 87 may be influential as well. My guess would be that these observations are responsible for the difference between nonparametric (which will include these observations) and parametric (which will simulate data consistent with the model).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1