'Pandas group by selected dates
I have a dataframe that is very similar to this dataframe:
| index | date | month |
|---|---|---|
| 0 | 2019-12-1 | 12 |
| 1 | 2020-03-1 | 3 |
| 2 | 2020-07-1 | 7 |
| 3 | 2021-02-1 | 2 |
| 4 | 2021-09-1 | 9 |
And i want to combine all dates that are closest to a set of months. The months need to be normalized like this:
| Months | Normalized month |
|---|---|
| 3, 4, 5 | 4 |
| 6, 7, 8, 9 | 8 |
| 1, 2, 10, 11, 12 | 12 |
So the output will be:
| index | date | month |
|---|---|---|
| 0 | 2019-12-1 | 12 |
| 1 | 2020-04-1 | 4 |
| 2 | 2020-08-1 | 8 |
| 3 | 2020-12-1 | 12 |
| 4 | 2021-08-1 | 8 |
Solution 1:[1]
you can try creating a dictionary of months where:
norm_month_dict = {3: 4, 4: 4, 5: 4, 6: 8, 7: 8, 8: 8, 9: 8, 1: 12, 2: 12, 10: 12, 11: 12, 12: 12}
then use this dictionary to map month values to their respective normalized month values.
df['normalized_months'] = df.months.map(norm_month_dict)
Solution 2:[2]
You need to construct a dictionary from the second dataframe (assuming df1 and df2):
d = (
df2.assign(Months=df2['Months'].str.split(', '))
.explode('Months').astype(int)
.set_index('Months')['Normalized month'].to_dict()
)
# {3: 4, 4: 4, 5: 4, 6: 8, 7: 8, 8: 8, 9: 8, 1: 12, 2: 12, 10: 12, 11: 12, 12: 12}
Then map the values:
df1['month'] = df1['month'].map(d)
output:
index date month
0 0 2019-12-1 12
1 1 2020-03-1 4
2 2 2020-07-1 8
3 3 2021-02-1 12
4 4 2021-09-1 8`
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Kshitij Srivastava |
| Solution 2 | mozway |
