'How to slice dataframe based on increment of value in a column?
frame = pd.Series([4051, 4052, 4053, 4054, 4060, 4061])
heat = pd.Series([52, 51, 50, 52, 53, 50])
df_1 = pd.DataFrame()
df_1['Frame'] = frame
df_1['Heat'] = heat
I have a dataframe df_1. I want to retrieve a dataframe df_2 which only contains the rows of df_1, whose increment of Frame from one row to the next is smaller or equal to 3. If the increment is larger, the search shall stop.
I tried this:
i = 0
df_2 = pd.DataFrame()
for i in df_1['Frame']:
j = i+1
if (df_1['Frame'][j] - df_1['Frame'][i]) > 3:
break
else:
df_2.append(i)
It results in an error. Can you find my mistake? If possible, I would prefer a solution without a loop since loops tend to be slow.
My desired output would be:
frame = pd.Series([4051, 4052, 4053, 4054])
heat = pd.Series([52, 51, 50, 52])
df_1 = pd.DataFrame()
df_1['Frame'] = frame
df_1['Heat'] = heat
Solution 1:[1]
Use Series.diff with compare for greater and mask by Series.cummax for filtering in boolean indexing with invert mask by ~ for bitwise NOT:
df_1 = df_1[~df_1['Frame'].diff().gt(3).cummax()]
print (df_1)
Frame Heat
0 4051 52
1 4052 51
2 4053 50
3 4054 52
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jezrael |
