'lag shift Funtion in Pyspark
I want to shift the number of orders by in range from 1 to 7. Here is my python code :
def make_features(data, max_lag):
for lag in range(1, max_lag + 1):
data['lag_{}'.format(lag)] = data['num_orders'].shift(lag)
make_features(df, 7)
I try to do the same think in Pyspark : Code :
def make_features(data, max_lag):
for lag in range(1, max_lag + 1):
data['lag_{}'.format(lag)] = data['num_orders'].shift(lag)
make_features(df, 7)
I get this error :
TypeError: 'int' object is not callable
Traceback (most recent call last):
TypeError: 'int' object is not callable
I also try this code :
for lag in range(1, 8):
window = Window.orderBy("date")
lagCol = lag(col("num_orders"), n).over(window)
df.withColumn(f"LagCol_{n}", lagCol)
This it just shift by 1 unit :
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

