'Rolling Apply Conditional Function to Pandas Series
I have a simple labelling method that I would like to apply on a rolling basis to a Pandas Series.
The idea is to look on a rolling N day basis, and classify whether each observation is above, below or in between a threshold.
For example:
Threshold = 1
if above threshold, then 1, if below threshold then 0 else 2.
Current implementation below:
import pandas as pd
# A pandas series of pct returns
pct_returns = pd.Series([0.01, 0.03, 0.07, 0.05, 0.001, 0.01, 0.05, 0.05])
def label(s, threshold):
if s >= threshold: return 1
else: return 0if s <= -threshold else 2
# apply on rolling basis
labels = s.rolling(20).apply(compute_label, args=(0.05,))
Sadly, with the above implementation, I receive a TypeError, TypeError: must be real number, not NoneType
Desired outcome:
I want to label the pct_returns Series based on the next rolling n days - if the pct_return is greater/less than the threshold, it is classified accordingly.
Any help greatly appreciated.
Solution 1:[1]
The logic is not fully clear, but you need to roll on the numeric Series, then you get one value per row, which you can use to map your labels.
Here is an example on 2 days rolling with the max value per window:
s = pct_returns.rolling(2, min_periods=1).max()
threshold = 0.05
labels = np.select([s.gt(threshold), s.le(-threshold)], [1, 0], 2)
output: array([2, 2, 1, 1, 2, 2, 2, 2])
detailed output as DataFrame:
data roll_2_max labels
0 0.010 0.01 2
1 0.030 0.03 2
2 0.070 0.07 1
3 0.050 0.07 1
4 0.001 0.05 2
5 0.010 0.01 2
6 0.050 0.05 2
7 0.050 0.05 2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
