'Hierarchical model for counts with categorical data in PyMC3 - help structuring

New here and working my way through examples from various references. I have seen multiple ways in which to structure hierarchical models with categorical variables in PyMC3. As a toy example, say I have a system that collects overdispersed count data (say units consumed) which can be broken down into multiple phases, is associated with a certain available geographic hierarchies (e.g. local, regional, etc.), and where there are multiple processes put into effect to influence the count outcome.

I’ve put together the following model structuring, thinking I could model each process separately - although it would be more efficient to add 'process' as well. I have seen examples of this structure online and in reference materials.

count = data['count'].values
phase_idx, phase = data['phase'].factorize(sort=True)
region_idx, region = data['region'].factorize(sort=True)
process_idx, process = process['process'].factorize(sort=True)

COORDS = {
    'obs_id': np.arange(len(phase_idx)),
    'phase': phase,
    'region': region,
    'process': process
}

with pm.Model(coords = COORDS) as count_model:
    
    data = pm.Data('counts', count, dims=('obs_id'))
    
    ## Hyperpriors 
    mu_lam = pm.Exponential('mu_lam', lam=0.3)
    a_lam = pm.Exponential('a_lam', lam=2)

    ## Priors
    μ = pm.Exponential('μ', lam= mu_lam, dims= ('phase', 'region'))
    α = pm.Exponential('α', lam= a_lam, dims = ('phase', 'region'))
    
    # Likelihood
    obs = pm.NegativeBinomial('obs', mu= μ[phase_idx, region_idx], \
                                         alpha= α[phase_idx, region_idx], observed= data, dims= ('obs_id'))


    idata_trace = pm.sample(draws= 10_000, tune=5_000, chains=2, target_accept= 0.95)

Is this valid way to structure the model? It samples fine with made up data and provides the output of interest - describing the uncertainty around μ for each categorical variable.

How about all three variables? Unfortunately, the three variables keep killing my kernel.

# Likelihood
obs = pm.NegativeBinomial('obs', mu= μ[phase_idx, region_idx, process_idx], \
                                      alpha= α[phase_idx, region_idx, process_idx], observed= data, dims= ('obs_id'))


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source