'How to create histogram of extremely right skewed data by different bin sizes
I have a variable that is very right skewed. Most of the observations are "0" while the max value is close to 4000. A histogram plot yields something ridiculous so I want to create bins of different sizes.
I used the following code
bins = [0, 1, 2, 5.0, 10.0, 20.0, 30.0]
fig = plt.hist(df[df.year==2021].diversification, bins=bins)
plt.xticks([0, 1, 2, 5.0, 10.0, 20.0, 30.0])
plt.show()
But i get the following plot with different bin width. Ideally I want to have the same bin width no matter the interval. Any idea how to implement this?
Solution 1:[1]
You could add a new column indicating the range. Pandas' pd.cut() is a function calculating to which range each element belongs.
Then you can create a bar plot with that range on the x-axis and the counts on the y-axis:
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
bins = [0, 1, 2, 5, 10, 20, 30]
xs = np.random.geometric(0.2, size=1000) - 1
df = pd.DataFrame({'x': xs, 'year': 2021})
df['range'] = pd.cut(df['x'], bins, right=False)
# df['range'].value_counts().sort_index().plot.bar() # to plot via pandas
df_counts = df['range'].value_counts().sort_index().reset_index(name='Count').rename(columns={'index': 'Range'})
ax = sns.barplot(data=df_counts, x='Range', y='Count', palette='flare')
plt.show()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |


