'How to fit discrete distribution (boltzmann) to large dataset?
I have a NumPy array with a large number of data points = 53046323. The data represent durations and follow discrete distributions, after a search I believe it can fit Boltzmann. I did several trials to estimate the best parameters of the distribution to fit the data and the best was with lambda=1
sa1=np.load('all_4_daily_consec_count_baseline_list.npy',allow_pickle=True)
nn=sa1.tolist()
data=np.concatenate(nn)
plt.hist(data, bins=int(np.max(data)), density=True, alpha=0.5)
plt.plot(data, boltzmann.pmf(data,1,53046322), 'go', markersize=9)
But is not suiting all values as in figure (test_boltzmann_full) test_boltzmann_full
So I tried to fit part of the data as in figure (test2), which has the same distribution shape and number of data points 585
sa1=np.load('all_4_daily_consec_count_baseline_list.npy',allow_pickle=True)
xx=np.reshape(sa1,(607,484))
noov_h=xx[306,250]
noov_hh=noov_h.astype('float')
data=noov_hh[~np.isnan(noov_hh)]
plt.hist(data, bins=int(np.max(data)), density=True, alpha=0.5)
plt.plot(data, boltzmann.pmf(data,1,584), 'go', markersize=9)
May I know how to get the best-fitting parameters to fit my data? in both cases? is there an adaptive way to get better fitting?
The data is in the link as its large to be uploaded here data
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
