'Rolling and keep the threshold
I have a database in pandas with a date and a variable. I need to compare the values of the variable with its next three days and if any of its values are increased with respect to the initial value, copy the date in a new dataframe. However, I need that the threshold that is taken into account is that of the first group or the first row and only if the value is greater than this threshold, its value is changed by the new threshold.
Below is an example:
| index | Date | variable | Threshold |
|---|---|---|---|
| 1 | 2020-04-01 | -1.128146 | Initial threshold |
| 2 | 2020-04-02 | 0.389000 | After first iteration-New threshold |
| 3 | 2020-04-03 | -0.400516 | |
| 4 | 2020-04-04 | -0.077735 | |
| 5 | 2020-04-05 | 0.5 | After second iteration New threshold |
| 6 | 2020-04-06 | 0.45 | |
| 7 | 2020-04-07 | 0.25 | |
| 8 | 2020-04-07 | -0.3 | After third iteration Keep before threshold (0.5) |
| 9 | 2020-04-07 | -0.2 |
Expect dataframe:
| index | Date | variable |
|---|---|---|
| 2 | 2020-04-02 | 0.389000 |
| 5 | 2020-04-05 | 0.5 |
Solution 1:[1]
Given the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{
"date": {
1: "2020-04-01",
2: "2020-04-02",
3: "2020-04-03",
4: "2020-04-04",
5: "2020-04-05",
6: "2020-04-06",
7: "2020-04-07",
8: "2020-04-07",
9: "2020-04-07",
},
"variable": {
1: -1.128146,
2: 0.389,
3: -0.400516,
4: -0.077735,
5: 0.5,
6: 0.45,
7: 0.25,
8: -0.3,
9: -0.2,
},
}
)
Here is one way to do it:
threshold = df.loc[1, "variable"]
indices = []
for idx in range(2, df.shape[0], 3):
if (new_threshold:=max(df.loc[idx: idx+2, "variable"])) > threshold:
threshold = new_threshold
indices.append(df.loc[df["variable"] == new_threshold, "variable"].index[0])
new_df = df.loc[indices, :]
print(new_df)
# Output
date variable
2 2020-04-02 0.389
5 2020-04-05 0.500
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Laurent |
