'How to group by number of bins a ordered dataframe?
I have a dataframe like that:
| year | count_yes | count_no |
|---|---|---|
| 1900 | 5 | 7 |
| 1903 | 5 | 3 |
| 1915 | 14 | 6 |
| 1919 | 6 | 14 |
I want to have two bins, independently of the value itself.
How can I group those categories and sum its values?
Expected result:
| year | count_yes | count_no |
|---|---|---|
| 1900 | 10 | 10 |
| 1910 | 20 | 20 |
Logic: Grouped the first two rows (1900 and 1903) and the two last rows (1915 and 1919) and summed the values of each category
I want to create a stacked percentage column graphic, so 1900 would be 50/50% and 1910 would be also 50/50%.
I've already created the function to build this graphic, I just need to adjust the dataframe size into bins to create a better distribution and visualization
Solution 1:[1]
This is a way to do what you need, if you are ok using the decades as index:
df['year'] = (df.year//10)*10
df_group = df.groupby('year').sum()
Output>>>
df_group
count_yes count_no
year
1900 10 10
1910 20 20
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | P. Pinho |
