'Different prediction results for a glm between sjPlot get_model_data and predict.glm, especially, but not only, with an averaged model
I run a glm with negative binomial to test the effect of two categorical variables on count data. I then used model average for all possible models and tried to predict the model's response using two methods (see code below):
- using
predict - using
get_model_datafromsjPlot
The model fit gives the same results, but the CI are much bigger with predict. I also tried this with a simple model (not averaged) and got closer results, but still different.
m1 = glm.nb(C~ A*B, data = Route) #A has two levels and B has three levels
d1 = dredge(m1)
m.avg1 = model.avg(d1, fit = T)
newdat = expand.grid(B= c("B1","B2","B3"), A = c("A1", "A2"))
as.data.frame(predict(m.avg1, newdata = newdat, se.fit = T, type = "response")) %>%
mutate(CI = 1.96*(se.fit), CI.u = fit+CI, CI.l = fit-CI) %>%
cbind(.,newdat)
get_model_data(m.avg1,type="pred", terms = c("B", "A"), se = T) #although I use se = T, I do not get se in the output, just 95% CI
Here is a link to the data on google sheets
An example of the results:
- With
predict, for A1*B1 I get predicted fit = 13.24, 95% CI = [10.23, 16.24] - With
get_model_data, for A1*B1 I get predicted fit = 13.24, 95% CI = [13.01, 13.46]
When doing the same with for the original model m1 I get closer results - a difference of only 0.31 in CI between the two options (for the same example as above). Yet, this is still a difference which keeps me up at night...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
