'Explained and Unexplained Effects don't add up to gap in Oaxaca-Blinder decomposition (statsmodels)
Here's a Python implementation of Oaxaca-Blinder decomposition analysis using statsmodels.
First, how do I interpret the negative value for Explained Effect here? Second, I thought the two effects (Explained and Unexplained) should add up to the Gap (0.07160), but they do not here. In the past, I've definitely seen the two add up to the Gap.
from statsmodels.stats.oaxaca import OaxacaBlinder
features = ['geo', 'figure', 'multipart', 'long']
ob = OaxacaBlinder(endog=df['rate'], exog=df[features], bifurcate='geo')
ob.two_fold().summary()
ob.three_fold().summary()
Oaxaca-Blinder Two-fold Effects
Unexplained Effect: 0.06270
Explained Effect: -0.00204
Gap: 0.07160
Oaxaca-Blinder Three-fold Effects
Characteristic Effect: -0.00191
Coefficient Effect: 0.06242
Interaction Effect: 0.00014
Gap: 0.07160
I've also run the underlying linear regression models and confirmed that the gap is identical to the difference between the mean predicted outcomes of the two models, so I believe the issue has to do with the individual Effects values.
from sklearn.linear_model import LinearRegression
us = df[df['geo']==0]
intl = df[df['geo']==1]
a = LinearRegression()
a.fit(us[features], us['rate'])
a.predict([us[features].mean().values]) # 0.59456195
b = LinearRegression()
b.fit(intl[features], intl['rate'])
b.predict([intl[features].mean().values]) # 0.66615859
0.66615859 - 0.59456195 # 0.07159663999999999 - same as gap above
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
