'Is there an easy way to zero time with each new condition in a pandas dataframe?
I have a big-ass time series data frame where one condition changes at variable intervals. I would like to zero the time with each new condition, so I converted the categories into integers and created a new column using the .diff() to indicate the rows where the switch occurs with non-zero values. Then I made a new column, "Mod_time" as a container for the new time values that zero at each new condition. This is what I want the table to look like:
| Time | Condition | Numerical Condition | Fruit_switch | Mod_time |
|---|---|---|---|---|
| 0 | Apples | 6 | nan | 0 |
| 1 | Apples | 6 | 0 | 1 |
| 2 | Apples | 6 | 0 | 2 |
| 3 | Apples | 6 | 0 | 3 |
| 4 | Oranges | 2 | -4 | 0 |
| 5 | Oranges | 2 | 0 | 1 |
I tried iterrows:
for index, row in gas_df.iterrows():
if row['gas_switch'] != 0:
gas_df.loc[[index], ["Mod_time"]] = 0
else:
gas_df.loc[[index], ["Mod_time"]] = gas_df.loc[[str(int(index)-1)], ["Mod_time"]] + 1
But got the error "None of [Index(['0'], dtype='object')] are in the [index]" It seems that iterrows is blind to everything but the one row it's looking at.
I also tried using enumerate instead of iterrows and got the same error.
Any suggestions or search terms would be appreciated.
Solution 1:[1]
There is a whole garden variety of problems that involve cumulative sum with reset. This one can be seen as such: you'd like to do the cumulative sum of time differences, with reset when the "numerical condition" changes.
import numpy as np
def cumsum_reset(v, reset):
v = v.copy()
c = np.cumsum(~reset)
v[reset] = -np.diff(np.r_[0, c[reset]])
return np.cumsum(v)
# application
cond = df['Numerical Condition']
df['Mod_time'] = cumsum_reset(np.diff(np.r_[0, df['Time']]),cond != cond.shift())
On your data:
Time Condition Numerical Condition Fruit_switch Mod_time
0 0 Apples 6 NaN 0
1 1 Apples 6 0.0 1
2 2 Apples 6 0.0 2
3 3 Apples 6 0.0 3
4 4 Oranges 2 -4.0 0
5 5 Oranges 2 0.0 1
Edit
From the comments, it sounds like the reset should really come from when df['Condition'] (the fruit name) changes. Also, the time difference between the rows is always one. Therefore, the following should work as well:
c = df['Condition']
df['Mod_time'] = cumsum_reset(np.ones_like(c), c.shift() != c)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
