'How do I loop through df and match like values in a column?

I Have a dataframe where I would like to:

  1. match the like values in a column
  2. ensure that the dates are greater than those of the previously matched.
  3. perform an operation such as subtract a value from another column.

For ex:

|       date | A | B |
|:---------- |:--|:--|
| 01-01-2022 |k1 | 1 |
| 01-02-2022 |c1 | 5 |
| 01-02-2022 |e1 | 4 |
| 01-05-2022 |k1 | 9 |
| 01-10-2022 |c1 | 9 |

In col A there are two k1's. Now what I would like to happen is identify the first k1 then the second, verify that the date in row 4 > row 0, then subtract the value in col B at row 0 from col B row 4 and so on until there aren't any k1's left. Then it moves on to the next value in col A etc. Then the results in a col C.

With the original data I was able to accomplish this via the groupby() function while using the shift() method which worked well. However, the one kink I had with that is I don't know how to merge that info back to the original df.



Solution 1:[1]

We can use groupby and shift for your goal.

df = pd.DataFrame({'date': ['01-01-2022',
  '01-02-2022',
  '01-02-2022',
  '01-05-2022',
  '01-10-2022'],
 'A': ['k1', 'c1', 'e1', 'k1', 'c1'],
 'B': [1, 5, 4, 9, 9]})

Your "verify that the date in row 4 > row 0" is checked by:

mask = df.groupby('A')['date'].transform(lambda b: b > b.shift())

and your "subtract the value in col B at row 0 from col B row 4" is done separately by

subtracted = df.groupby('A')['B'].transform(lambda b: b - b.shift())

Finally we can combine the two to get the final result, picking a value from subtracted only when the corresponding mask is True.

df['masked_subtraction'] = np.where(mask, subtracted, np.nan)
print(df)

         date   A  B  masked_subtraction
0  01-01-2022  k1  1                 NaN
1  01-02-2022  c1  5                 NaN
2  01-02-2022  e1  4                 NaN
3  01-05-2022  k1  9                 8.0
4  01-10-2022  c1  9                 4.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Raymond Kwok