'Deleting rows which sum to zero in 1 column (like pairs) but some elements in other columns are different
Consider a dataframe like
| product | date | value | city | reference |
|---|---|---|---|---|
| y | 21-01-2020 | -16 | Paris | xji |
| x | 25-01-2020 | -16 | Munich | zyc |
| x | 01-05-2020 | 16 | Munich | dkj |
| y | 20-01-2020 | -20 | Paris | jkdl |
| z | 21-01-2020 | 30 | Munich | 212jio |
| t | 05-01-2020 | 2 | Munich | jidi1 |
| y | 21-03-2020 | 20 | Paris | reg1 |
I would like to delete rows that value cancel each other, having something like
| product | date | value | city | reference |
|---|---|---|---|---|
| y | 21-01-2020 | -16 | Paris | xji |
| z | 21-01-2020 | 30 | Munich | 212jio |
| t | 05-01-2020 | 2 | Munich | jidi1 |
Solution 1:[1]
We aggregate by city and by abs(value) to match transactions, then add them up and remove those matched transactions that add up to 0:
df2 = (df.groupby([df['city'], abs(df['value'])])
.agg({'product':'first','city':'first', 'date':'first','value':sum, 'reference':'first'})
.reset_index(drop = True)
)
df2.where(df2['value']!=0).dropna()
output:
product city date value reference
0 t Munich 05-01-2020 2.0 jidi1
2 z Munich 21-01-2020 30.0 212jio
3 y Paris 21-01-2020 -16.0 xji
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
