'lag shift Funtion in Pyspark

I want to shift the number of orders by in range from 1 to 7. Here is my python code :

def make_features(data, max_lag):    
    for lag in range(1, max_lag + 1):
        data['lag_{}'.format(lag)] = data['num_orders'].shift(lag)

make_features(df, 7)

I try to do the same think in Pyspark : Code :

def make_features(data, max_lag):    
    for lag in range(1, max_lag + 1):
        data['lag_{}'.format(lag)] = data['num_orders'].shift(lag)

make_features(df, 7)   

I get this error :

    TypeError: 'int' object is not callable
Traceback (most recent call last):

TypeError: 'int' object is not callable   

I also try this code :

for lag in range(1, 8):
    window = Window.orderBy("date")
    lagCol = lag(col("num_orders"), n).over(window)
    df.withColumn(f"LagCol_{n}", lagCol)

This it just shift by 1 unit :

Expected result: enter image description here



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source