'Is there an easy way to zero time with each new condition in a pandas dataframe?

I have a big-ass time series data frame where one condition changes at variable intervals. I would like to zero the time with each new condition, so I converted the categories into integers and created a new column using the .diff() to indicate the rows where the switch occurs with non-zero values. Then I made a new column, "Mod_time" as a container for the new time values that zero at each new condition. This is what I want the table to look like:

Time Condition Numerical Condition Fruit_switch Mod_time
0 Apples 6 nan 0
1 Apples 6 0 1
2 Apples 6 0 2
3 Apples 6 0 3
4 Oranges 2 -4 0
5 Oranges 2 0 1

I tried iterrows:

for index, row in gas_df.iterrows(): 

   if row['gas_switch'] != 0:
       gas_df.loc[[index], ["Mod_time"]] = 0
   
   else:
       gas_df.loc[[index], ["Mod_time"]] = gas_df.loc[[str(int(index)-1)], ["Mod_time"]] + 1

But got the error "None of [Index(['0'], dtype='object')] are in the [index]" It seems that iterrows is blind to everything but the one row it's looking at.

I also tried using enumerate instead of iterrows and got the same error.

Any suggestions or search terms would be appreciated.



Solution 1:[1]

There is a whole garden variety of problems that involve cumulative sum with reset. This one can be seen as such: you'd like to do the cumulative sum of time differences, with reset when the "numerical condition" changes.

import numpy as np

def cumsum_reset(v, reset):
    v = v.copy()
    c = np.cumsum(~reset)
    v[reset] = -np.diff(np.r_[0, c[reset]])
    return np.cumsum(v)


# application
cond = df['Numerical Condition']
df['Mod_time'] = cumsum_reset(np.diff(np.r_[0, df['Time']]),cond != cond.shift())

On your data:

   Time Condition  Numerical Condition  Fruit_switch  Mod_time
0     0    Apples                    6           NaN         0
1     1    Apples                    6           0.0         1
2     2    Apples                    6           0.0         2
3     3    Apples                    6           0.0         3
4     4   Oranges                    2          -4.0         0
5     5   Oranges                    2           0.0         1

Edit

From the comments, it sounds like the reset should really come from when df['Condition'] (the fruit name) changes. Also, the time difference between the rows is always one. Therefore, the following should work as well:

c = df['Condition']
df['Mod_time'] = cumsum_reset(np.ones_like(c), c.shift() != c)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1