'Mixed effects logistic regression
I'm attempting to implement mixed effects logistic regression in python. As a point of comparison, I'm using the glmer function from the lme4 package in R.
I've found that the statsmodels module has a BinomialBayesMixedGLM that should be able to fit such a model. However, I've encountered a number of issues:
- I find the documentation for the
statsmodelsfunction to be not entirely helpful or clear, so I'm not completely sure how to use the function appropriately. - So far, my attempts have not produced results that replicate what I get when fitting the model with
glmerin R. - I expect the
BinomialBayesMixedGLMfunction does not calculate p-values since it is Bayesian, but I can't seem to figure out how to access the full posterior distributions for the parameters.
As a test case, I'm using the titanic dataset available here.
import os
import pandas as pd
import statsmodels.genmod.bayes_mixed_glm as smgb
titanic = pd.read_csv(os.path.join(os.getcwd(), 'titanic.csv'))
r = {"Pclass": '0 + Pclass'}
mod = smgb.BinomialBayesMixedGLM.from_formula('Survived ~ Age', r, titanic)
fit = mod.fit_map()
fit.summary()
# Type Post. Mean Post. SD SD SD (LB) SD (UB)
# Intercept M 3.1623 0.3616
# Age M -0.0380 0.0061
# Pclass V 0.0754 0.5669 1.078 0.347 3.351
However, aside from the slope for Age, this doesn't appear to match what I get in R with glmer(Survived ~ Age + (1 | Pclass), data = titanic, family = "binomial"):
Random effects:
Groups Name Variance Std.Dev.
Pclass (Intercept) 0.8563 0.9254
Number of obs: 887, groups: Pclass, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.961780 0.573402 1.677 0.0935 .
Age -0.038708 0.006243 -6.200 5.65e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So what error am I making when creating my model in python? And, once that's addressed, how can I extract either the posteriors or p-values? Lastly, are their any python implementations of mixed effect logistic regressions that are more akin to the implementation in R?
Solution 1:[1]
Just had to do something similar with Python, as suggested in the comments Pymer4
appeared to offer a suitable approach (especially if you are familiar with R anyway).
Using the example dataset 'titanic' referred to in the question:
from pymer4.models import Lmer
model = Lmer("Survived ~ Age + (1|Pclass)",
data=titanic, family = 'binomial')
print(model.fit())
OUT:
Formula: Survived~Age+(1|Pclass)
Family: binomial Inference: parametric
Number of observations: 887 Groups: {'Pclass': 3.0}
Log-likelihood: -525.812 AIC: 1057.624
Random effects:
Name Var Std
Pclass (Intercept) 0.856 0.925
No random effect correlations specified
Fixed effects:
Estimate 2.5_ci 97.5_ci SE OR OR_2.5_ci OR_97.5_ci \
(Intercept) 0.962 -0.162 2.086 0.573 2.616 0.85 8.050
Age -0.039 -0.051 -0.026 0.006 0.962 0.95 0.974
Prob Prob_2.5_ci Prob_97.5_ci Z-stat P-val Sig
(Intercept) 0.723 0.460 0.889 1.677 0.093 .
Age 0.490 0.487 0.493 -6.200 0.000 ***
Just as an additional comment (sorry for diverting from the main question), I ran this on an Ubuntu 20.04 machine with Python 3.8.8. Not sure if anyone else encountered this problem, but when running the model above with Pymer4 the package threw an error (Same error when I tried to reproduce a similar model form the Pymer4 documentation here):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
referring to this line in the /pymer4/models/Lmer.py package file:
--> 444 if design_matrix:
I fixed this by changing this to (not sure if this is the most elegant or safest approach, happy to be corrected here):
if design_matrix.any():
This appeared to get the package running and provide R-equivalent results in the few cases that I tested.
A better approach suggested in the comments below may be:
if design_matrix is not None:
thanks to Giacomo Petrillo for pointing that out, I have not tested this however.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
