'How to manually calculate the residual and random effects variance in a mixed-model?
I have the following mixed-model:
mod<-lmer(NO.TRANS~SEX+(1|ID),REML=F)
when I check the summary, I get:
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: NO.TRANS ~ SEX + (1 | ID)
AIC BIC logLik deviance df.resid
1530.8 1542.8 -761.4 1522.8 142
Scaled residuals:
Min 1Q Median 3Q Max
-1.5496 -0.6141 0.0230 0.5405 4.7476
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 1208 34.75
Residual 1111 33.33
Number of obs: 146, groups: ID, 74
Fixed effects:
Estimate Std. Error t value
(Intercept) 72.869 7.122 10.231
SEXM 6.842 9.809 0.698
Correlation of Fixed Effects:
(Intr)
SEXM -0.726
As one can see, the variance of the random effect (intercepts) is 1208. And the variance of the residuals is 1111.
After some research, I know the right way to calculate those is not var(resid(mod)) or var(coef(mod)$ID[,1]). However, I don't know what are the right way to calculate them. I know that VarCorr() can give those values, but I need to manually calculate them. And can't find the answer to this anywhere online or in my books. Can someone please help me?
This is the data:
ID SEX NO.TRANS
1 215 F 20
2 243 F 94
3 354 F 146
4 535 F 111
5 543 F 43
6 1025 F 72
7 1304 F 37
8 1305 F 35
9 1350 F 73
10 1502 F 117
11 1530 F 87
12 2205 F 22
13 2550 F 35
14 3024 F 34
15 3035 F 361
16 3042 F 120
17 3051 F 62
18 3302 F 109
19 3330 F 90
20 3504 F 38
21 3540 F 87
22 4013 F 41
23 4023 F 8
24 4025 F 157
25 4202 F 121
26 4205 F 59
27 4301 F 53
28 4304 F 66
29 4305 F 0
30 4410 F 109
31 5043 F 69
32 5203 F 84
33 5301 F 75
34 5520 F 93
35 135 M 6
36 225 M 53
37 251 M 65
38 425 M 86
39 452 M 60
40 534 M 78
41 1045 M 70
42 1055 M 103
43 1111 M 68
44 1404 M 0
45 2015 M 83
46 2033 M 92
47 2405 M 59
48 2440 M 43
49 2501 M 70
50 2503 M 99
51 3015 M 106
52 3023 M 0
53 3044 M 104
54 3150 M 101
55 3204 M 202
56 3305 M 79
57 3401 M 164
58 3403 M 68
59 3420 M 0
60 4041 M 17
61 4053 M 117
62 4104 M 157
63 4105 M 185
64 4302 M 72
65 5015 M 89
66 5025 M 86
67 5032 M 84
68 5042 M 94
69 5105 M 121
70 5250 M 67
71 5302 M 74
72 5403 M 45
73 215 F 0
74 243 F 107
75 354 F 131
76 535 F 87
77 543 F 83
78 1025 F 18
79 1304 F 21
80 1305 F 42
81 1350 F 49
82 1502 F 73
83 1530 F 114
84 2205 F 75
85 2550 F 90
86 3024 F 10
87 3035 F 164
88 3042 F 71
89 3051 F 109
90 3302 F 45
91 3330 F 20
92 3504 F 83
93 3540 F 63
94 4013 F 0
95 4023 F 53
96 4025 F 149
97 4035 F 117
98 4202 F 55
99 4205 F 78
100 4301 F 0
101 4304 F 28
102 4305 F 55
103 4410 F 107
104 5043 F 84
105 5203 F 39
106 5301 F 7
107 5520 F 50
108 135 M 89
109 225 M 56
110 251 M 26
111 342 M 62
112 425 M 68
113 452 M 92
114 534 M 37
115 1045 M 78
116 1055 M 99
117 1111 M 88
118 1404 M 0
119 2015 M 101
120 2033 M 24
121 2405 M 0
122 2440 M 72
123 2501 M 104
124 2503 M 113
125 3015 M 128
126 3023 M 75
127 3044 M 82
128 3150 M 92
129 3204 M 126
130 3305 M 103
131 3401 M 163
132 3403 M 26
133 3420 M 34
134 4041 M 85
135 4053 M 159
136 4104 M 167
137 4105 M 103
138 4302 M 69
139 5015 M 51
140 5025 M 41
141 5032 M 53
142 5042 M 47
143 5105 M 80
144 5250 M 67
145 5302 M 126
146 5403 M 94
Thanks in advance!
Solution 1:[1]
I may have misunderstood your question: I thought "how do I manually calculate ...?" meant "how do I calculate these values directly from the data without using a mixed model package?", whereas I now think you meant "how do I extract these values from the fitted model?"
Values to get CIs of (very similar to examples in ?lme4::bootMer)
f1 <- function(fitted) {
c(re_var = c(VarCorr(fitted)[[1]]), ## RE variance
resid_var = sigma(fitted)^2) ## residual variance
}
parametric bootstrap
With lme4::confint.merMod:
bb <- confint(mod,
method = "boot",
nsim = 1000,
## cosmetic:
.progress = "txt",
PBargs = list(style = 3),
FUN = function(fitted) {
c(re_var = c(VarCorr(fitted)[[1]]),
resid_var = sigma(fitted)^2)
})
Results:
2.5 % 97.5 %
re_var 551.6002 1804.407
resid_var 795.4443 1495.629
nonparametric bootstrap
Nonparametric bootstrapping for mixed models is a little bit tricky;
you definitely shouldn't resample individual observations independently (this will mess up the grouping structure of the data), but you may have to make more choices about whether you bootstrap within as well as between blocks (i.e. if you resample a block, should you also resample the observations within blocks?)
if you have more than one grouping variable in your data (you don't) you have to decide which blocks to resample;
if you have crossed random effects then it gets very hard to resample independent blocks ...
the
glmmbootpackage does block bootstrapping, but it looks like it only returns information about the fixed effects ...
library(lmeresampler)
system.time(
bb3 <- bootstrap(mod, .f = f1, type ="reb", reb_type = 0, B=1000)
) ## 21 seconds
## reb = "random effects block bootstrap"
## read the vignette("lmeresampler") for details on reb_type
confint(bb3, type = "perc")
# A tibble: 2 × 6
term estimate lower upper type level
<chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 re_var 1208. 901. 2894. perc 0.95
2 resid_var 1111. 352. 805. perc 0.95
I'm a little worried that the reported CIs are so different (e.g. 550-1800 (nonparametric) vs 901-2894 (compare with cc <- confint(mod, parm = "theta_", oldNames = FALSE); cc^2, which gives profile CIs; these are closer to the parametric bootstrap estimates).
A brief glance at plot(mod, id = 0.05, idLabels = ~.obs) shows that observation 15 is an outlier - observation 87 may be influential as well. My guess would be that these observations are responsible for the difference between nonparametric (which will include these observations) and parametric (which will simulate data consistent with the model).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
