'How do I loop through df and match like values in a column?
I Have a dataframe where I would like to:
- match the like values in a column
- ensure that the dates are greater than those of the previously matched.
- perform an operation such as subtract a value from another column.
For ex:
| date | A | B |
|:---------- |:--|:--|
| 01-01-2022 |k1 | 1 |
| 01-02-2022 |c1 | 5 |
| 01-02-2022 |e1 | 4 |
| 01-05-2022 |k1 | 9 |
| 01-10-2022 |c1 | 9 |
In col A there are two k1's. Now what I would like to happen is identify the first k1 then the second, verify that the date in row 4 > row 0, then subtract the value in col B at row 0 from col B row 4 and so on until there aren't any k1's left. Then it moves on to the next value in col A etc. Then the results in a col C.
With the original data I was able to accomplish this via the groupby() function while using the shift() method which worked well. However, the one kink I had with that is I don't know how to merge that info back to the original df.
Solution 1:[1]
We can use groupby and shift for your goal.
df = pd.DataFrame({'date': ['01-01-2022',
'01-02-2022',
'01-02-2022',
'01-05-2022',
'01-10-2022'],
'A': ['k1', 'c1', 'e1', 'k1', 'c1'],
'B': [1, 5, 4, 9, 9]})
Your "verify that the date in row 4 > row 0" is checked by:
mask = df.groupby('A')['date'].transform(lambda b: b > b.shift())
and your "subtract the value in col B at row 0 from col B row 4" is done separately by
subtracted = df.groupby('A')['B'].transform(lambda b: b - b.shift())
Finally we can combine the two to get the final result, picking a value from subtracted only when the corresponding mask is True.
df['masked_subtraction'] = np.where(mask, subtracted, np.nan)
print(df)
date A B masked_subtraction
0 01-01-2022 k1 1 NaN
1 01-02-2022 c1 5 NaN
2 01-02-2022 e1 4 NaN
3 01-05-2022 k1 9 8.0
4 01-10-2022 c1 9 4.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Raymond Kwok |
