'How to fit discrete distribution (boltzmann) to large dataset?

I have a NumPy array with a large number of data points = 53046323. The data represent durations and follow discrete distributions, after a search I believe it can fit Boltzmann. I did several trials to estimate the best parameters of the distribution to fit the data and the best was with lambda=1

sa1=np.load('all_4_daily_consec_count_baseline_list.npy',allow_pickle=True)
nn=sa1.tolist()
data=np.concatenate(nn)
plt.hist(data, bins=int(np.max(data)), density=True, alpha=0.5)
plt.plot(data, boltzmann.pmf(data,1,53046322), 'go', markersize=9)

But is not suiting all values as in figure (test_boltzmann_full) test_boltzmann_full

So I tried to fit part of the data as in figure (test2), which has the same distribution shape and number of data points 585

sa1=np.load('all_4_daily_consec_count_baseline_list.npy',allow_pickle=True)
xx=np.reshape(sa1,(607,484))
noov_h=xx[306,250]
noov_hh=noov_h.astype('float')
data=noov_hh[~np.isnan(noov_hh)]
plt.hist(data, bins=int(np.max(data)), density=True, alpha=0.5)
plt.plot(data, boltzmann.pmf(data,1,584), 'go', markersize=9)

test2

May I know how to get the best-fitting parameters to fit my data? in both cases? is there an adaptive way to get better fitting?

The data is in the link as its large to be uploaded here data

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to fit discrete distribution (boltzmann) to large dataset?

Sources

Related Questions