'r-mice, degrees of freedom incorrect when pooling multilevel imputations
I imputed a data set with missing data from 81 schools and 13,177 students. I used mice to compute 10 imputations. (Note: this is a simplification of the original problem I ran into, but the problem is reproduced in this simpler version.) I have the imputed data set saved as an SPSS file called ImputedDL.sav.
When I run any multilevel model and pool the data, the degrees of freedom are much too high for School-level variables. Here is an example running two school level variables "MinoritySchC" and "DisadvSchC" which represent, respectively, the proportion of students at the school who are coded as "minority" (not White or Asian) and the proportion of students who are coded as "disadvantaged" (free or reduced lunch). I also include a student-level variable, Female, coded 0 or 1.
the mice code is:
library(mice)
library(lmerTest)
library(broom.mixed)
library(haven)
library(dplyr)
data <- read_sav("F:/scull data/NEW FILES 2022/2022 completed imputations/ImputedDl.sav")
data<-rename(data, .imp = Imputation_)
data<-rename(data, .id = ID)
summary(data)
test2<-as.mids(data)
fit<-with(test2,lmer(Outcome~
Female+MinoritySchC+DisadvSchC+ (1|SchoolID),REML=TRUE))
pool(fit)
summary(pool(fit))
The output reports the following degrees of freedom for school level variables: Intercept df=13,124.83, MinoritySchC df = 13,167.49, DisadvSchC df=13,167.49
For the STUDENT LEVEL VARIABLE, the degrees of freedom is: Female df = 9,546.37
Note that the df are HIGHER for the school-level variables than for the student-level variable, and actually equal to the number of students in the data set minus 10 (the number of imputations, perhaps?)
Using the Satterthwaite adjustment, the df for student-level variables should be a bit less than 13,177 and the df for school-level variables should be on the order of magnitude of the number of schools (81). The output degrees of freedom seem to be wrong, and make me worry that the entire pooling program might not be working correctly.
To check my understanding, I ran lmer for Imputation 1, and indeed the program output the degrees of freedom that look correct. Here the code that first selected a single imputation and then ran a multilevel analysis.
imp1 <- data[which(data$.imp == 1),]
model2<-lmer(Outcome~
Female+MinoritySchC+DisadvSchC+(1|SchoolID),REML=TRUE,data=imp1)
summary(model2)
In contrast to the multiply imputed run, the output for the lmer program, using only a single imputation, looks like it has reasonable degrees of freedom.
The output reports the following degrees of freedom for school level variables: Intercept df=85.6, MinoritySchC df = 73.6, DisadvSchC df=74.7
For the STUDENT LEVEL VARIABLE, the degrees of freedom is: Female df = 13,110
Is there a way to get correct degrees of freedom when pooling multilevel data? Is the error in computing df symptomatic of other errors? That is, should I trust the mice program to do my pooling for me?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
