'Errors attempting to use linearmodels.panel.PanelOLS entity effects (not time effects)

I have a Pandas DataFrame like (abridged):

age gender control county
11877 67.0 F 0 AL-Calhoun
11552 60.0 F 0 AL-Coosa
11607 60.0 F 0 AL-Talladega
13821 NaN NaN 1 AL-Mobile
11462 59.0 F 0 AL-Dale

I want to run a linear regression with fixed effects by county entity (not by time) to balance check my control and treatment groups for an experimental design, such that my dependent variable is membership in the treatment group (control = 1) or not (control = 0).

In order to do this, so far as I have seen I need to use linearmodels.panel.PanelOLS and set my entity field (county) as my index.

So far as I'm aware my model should look like this:

# set index on entity effects field:
to_model = to_model.set_index(["county"])

# implement fixed effects linear model
model = PanelOLS.from_formula("control ~ age + gender + EntityEffects", to_model)

When I try to do this, I get the below error:

ValueError: The index on the time dimension must be either numeric or date-like

I have seen a lot of implementations of such models online and they all seem to use a temporal effect, which is not relevant in my case. If I try to encode my county field using numerics, I get a different error.

# create a dict to map county values to numerics
county_map = dict(zip(to_model["county"].unique(), range(len(to_model.county.unique()))))

# create a numeric column as alternative to county
to_model["county_numeric"] = to_model["county"].map(county_map)

# set index on numeric entity effects field
to_model = to_model.set_index(["county_numeric"])
FactorEvaluationError: Unable to evaluate factor `control`. [KeyError: 'control']

How am I able to implement this model using the county as a unit fixed effect?



Solution 1:[1]

Assuming you have multiple entries for each county, then you could use the following. The key step is to use a groupby transform to create a distinct numeric index for each county which can be used as a fake time index.

import numpy as np
import pandas as pd
import string
import linearmodels as lm

# Generate randomd DF
rs = np.random.default_rng(1213892)
counties = rs.choice([c for c in string.ascii_lowercase], (1000, 3))
counties = np.array([["".join(c)] * 10 for c in counties]).ravel()
age = rs.integers(18, 65, (10 * 1000))
gender = rs.choice(["m", "f"], size=(10 * 1000))
control = rs.integers(0, 2, size=10 * 1000)
df = pd.DataFrame(
    {"counties": counties, "age": age, "gender": gender, "control": control}
)
# Construct a dummy numeric index for each county
numeric_index = df.groupby("counties").age.transform(lambda c: np.arange(len(c)))
df["numeric_index"] = numeric_index
df = df.set_index(["counties","numeric_index"])
# Take a look
df.head(15)
                        age gender  control
counties numeric_index                     
qbt      0               51      m        1
         1               36      m        0
         2               28      f        1
         3               28      m        0
         4               47      m        0
         5               19      m        1
         6               32      m        1
         7               54      m        0
         8               36      m        1
         9               52      m        0
nub      0               19      m        0
         1               57      m        0
         2               49      f        0
         3               53      m        1
         4               30      f        0

This just shows that the model can be estimated.

# Fit the model
# Note: Results are meaningless, just shows that this works
lm.PanelOLS.from_formula("control ~ age + gender + EntityEffects", data=df)
mod = lm.PanelOLS.from_formula("control ~ age + gender + EntityEffects", data=df)
mod.fit()
                          PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:                control   R-squared:                        0.0003
Estimator:                   PanelOLS   R-squared (Between):              0.0005
No. Observations:               10000   R-squared (Within):               0.0003
Date:                Thu, May 12 2022   R-squared (Overall):              0.0003
Time:                        11:08:00   Log-likelihood                   -6768.3
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                      1.4248
Entities:                         962   P-value                           0.2406
Avg Obs:                       10.395   Distribution:                  F(2,9036)
Min Obs:                      10.0000                                           
Max Obs:                       30.000   F-statistic (robust):             2287.4
                                        P-value                           0.0000
Time periods:                      30   Distribution:                  F(2,9036)
Avg Obs:                       333.33                                           
Min Obs:                       2.0000                                           
Max Obs:                       962.00                                           
                                                                                
                              Parameter Estimates                              
===============================================================================
             Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
-------------------------------------------------------------------------------
age            -0.0002     0.0004    -0.5142     0.6072     -0.0010      0.0006
gender[T.f]     0.5191     0.0176     29.559     0.0000      0.4847      0.5535
gender[T.m]     0.5021     0.0175     28.652     0.0000      0.4678      0.5365
===============================================================================

F-test for Poolability: 0.9633
P-value: 0.7768
Distribution: F(961,9036)

Included effects: Entity
PanelEffectsResults, id: 0x2246f38a9d0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kevin S