'Specification of prior with Laplace distribution causing issue when modeling likelihood as gamma

I'm building a MCMC model using pymc. Issue is that I appear to be misspecifying the parameters for one of the priors, which causes the model-building to stall.

Although I get no error, there is a warning upon launching model-building:

The variable specified for alpha has negative support for Gamma, likely making it unsuitable for this parameter.

I am including my code along with the data used. Note that I load in the mean and sd for the priors. Technically, this is based on more granular data and I can aggregate to get other parameters. However, it seems that most distributions have a way to work mean and sigma.

Since we know that the data fits a gamma distribution, and gamma takes parameters alpha and beta, I include a function for deriving these from mean and sigma.

I understand the warning states that the problem has something to do with my specification of the alpha, based on a Laplace distr. However, the Pymc3 doc has mu(location) and b(scale) as parameters, and the scipy spec for Laplace is the same. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.laplace.html

It could also be because I'm not specifying the location parameter that I get for the gamma distribution from scipy in the pymc specification, but not sure how to do that.

from fitter import Fitter
import numpy as np
import pandas as pd
import pymc3 as pm

# create list of cont. distributions in pymc
pymc3_avail_distr = ['normal', 'halfnorm', 'beta', 'expon', 'laplace', 
                    'laplace_asymmetric', 't', 'cauchy', 'halfcauchy',
                    'gamma', 'invgamma', 'lognorm', 'chisquare', 
                    'wald', 'pareto', 'vonmises', 'triangular',
                    'rice', 'logistic']

def get_best_distr(data):
    f = Fitter(data, 
        distributions=pymc3_avail_distr)
    f.fit()
    res = f.get_best()
    return res

observed = np.array([1470841, 1525474, 1459616, 1585676])
top_dist = get_best_distr(observed)

# gamma distribution fits the data best
print(top_dist)
{'gamma': {'a': 0.21972366541425908, 'loc': 1459615.9999999998, 'scale': 61871.048147509544}}

df_prior = pd.DataFrame({'std': [60796.278573,
                                60129.729121,
                                60923.338104,
                                62732.288489,
                                61313.038552,
                                61198.029576,
                                63192.944452,
                                61367.557455,
                                60385.663706,
                                62908.801167,
                                60548.634300,
                                60662.409589,
                                62618.118917,
                                64691.495035,
                                63116.079813,
                                61716.377629,
                                60205.848191,
                                61343.866119,
                                63962.879303],
                        'mean': [1.773200e+06,
                                1.780192e+06,
                                1.774824e+06,
                                1.774066e+06,
                                1.775952e+06,
                                1.787677e+06,
                                1.783817e+06,
                                1.785475e+06,
                                1.776615e+06,
                                1.795915e+06,
                                1.780723e+06,
                                1.778066e+06,
                                1.794443e+06,
                                1.792919e+06,
                                1.793265e+06,
                                1.776529e+06,
                                1.789222e+06,
                                1.783472e+06,
                                1.776677e+06]}, 
                         index=[np.arange(1,20)])

# According to https://docs.pymc.io/en/v3/api/distributions/continuous.html#pymc3.distributions.continuous.Gamma
def calc_alpha(x):
    return x['mean']**2 / x['std']

def calc_beta(x):
    return x['mean'] / x['std']**2

alphas = df_prior.apply(calc_alpha, axis=1).to_frame(name='alpha')
betas = df_prior.apply(calc_beta, axis=1).to_frame(name='beta')

alpha_beta = alphas.join(betas)

# Find the best-fitting distributions and associated parameters
# for the alpha and beta
alpha_best_fit = get_best_distr(alpha_beta['alpha'])
beta_best_fit = get_best_distr(alpha_beta['beta'])

print(alpha_best_fit)
{'laplace': {'loc': 51704327.58623222, 'scale': 767594.4959681442}}

print(beta_best_fit)
{'beta': {'a': 1.5774623511223087, 'b': 0.8549502179381592, 'loc': 0.00042016580933100034, 'scale': 7.34475114442029e-05}}

with pm.Model() as model1:
    alpha = pm.Laplace('alpha', mu = alpha_best_fit['laplace']['loc'], b = alpha_best_fit['laplace']['scale'])
    beta = pm.Beta('beta', alpha = beta_best_fit['beta']['a'], beta = beta_best_fit['beta']['b'])
    
    # Based on distr of raw data
    pred = pm.Gamma("posterior", alpha = alpha, beta = beta, observed=observed)

    trace = pm.sample(1000, tune=800)



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source