'Pyspark dataframe in each group fill in zero/null values with previous rows' values with aggregation on other columns

I have a dataframe as this and I want a new column "expected_target" based on where my "var_2" is zero. I want it to be filled with the difference of "var_2" and "var_1" from the previous row. This is to happen for each groupby on "id" and there might be any number of zero values in "var_2".

data = [(1,0,1,4,4),
       (1,0,1,4,4),
       (1,0,1,4,4),
       (1,1,1,4,4),
       (1,1,2,0,3),
       (1,1,2,0,2),
       (1,1,2,1,1),
       (2,0,1,24,24),
       (2,0,1,24,24),
       (2,0,1,24,24),
       (2,1,1,24,24),
       (2,1,2,0,23),
       (2,1,2,0,22),
       (2,1,2,0,21),
       (2,1,2,21,20)
      ]
cols = ['id','id_2','var_1','var_2','expected_target']
data_df = spark.createDataFrame(data=data,schema=cols)
display(data_df)

Please help!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Pyspark dataframe in each group fill in zero/null values with previous rows' values with aggregation on other columns

Sources

Related Questions