'Pyspark dataframe in each group fill in zero/null values with previous rows' values with aggregation on other columns
I have a dataframe as this and I want a new column "expected_target" based on where my "var_2" is zero. I want it to be filled with the difference of "var_2" and "var_1" from the previous row. This is to happen for each groupby on "id" and there might be any number of zero values in "var_2".
data = [(1,0,1,4,4),
(1,0,1,4,4),
(1,0,1,4,4),
(1,1,1,4,4),
(1,1,2,0,3),
(1,1,2,0,2),
(1,1,2,1,1),
(2,0,1,24,24),
(2,0,1,24,24),
(2,0,1,24,24),
(2,1,1,24,24),
(2,1,2,0,23),
(2,1,2,0,22),
(2,1,2,0,21),
(2,1,2,21,20)
]
cols = ['id','id_2','var_1','var_2','expected_target']
data_df = spark.createDataFrame(data=data,schema=cols)
display(data_df)
Please help!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
