'Pandas ffill() equivalent in PySpark

I have a dataframe which has missing values in a row, and I use df.ffill(axis=1, inplace=True) to perform the transformation using Pandas.

I want to understand what would be the PySpark equivalent way to achieve this. I have read about using Window functions but those work over the column axis.

Example :

Input :

id	value1	value2	value3	value4	value5
A	2	3	NaN	NaN	6
B	1	NaN	NaN	NaN	NaN

Output :

id	value1	value2	value3	value4	value5
A	2	3	3	3	6
B	1	1	1	1	1

pandas pyspark

Solution 1:^[1]

You can use coalesce it will take values from value3 column if it's not null, otherwise from value2 column

from pyspark.sql.functions import coalesce

df = df.withColumn('value3', coalesce('value3', 'value2'))

To do this for all your dataset you simply do a for loop on all the columns. Like this :

from pyspark.sql.functions import coalesce

cols = df.columns
for i in range(1,len(cols)):
    df = df.withColumn(cols[i], coalesce(cols[i], cols[i-1]))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	seghair tarek

'Pandas ffill() equivalent in PySpark

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]