'Loop through columns in a dataframe and add a new column to the dataframe with the first non null value found. Using PySpark

I'm new to PySpark trying to figure out how to achieve the desired results below.

I have a dataframe which contains several columns. I want to loop through columns id1, id2 and id3 and once the first non null value is found a new column should be added with this value. After finding the value no more loop for that record is needed.

The dataframe:

name	id1	hobby	id2	gender	id3	language
Mike	AAA-BBB	Fishing		M	AAA-BBB	Eng
Louis			ABC-DDD	M
Peter	DSA-SDF	Hunting	DSA-SDF	M	DSA-SDF	Eng

The desired dataframe:

name	id1	hobby	id2	gender	id3	language	id
Mike	AAA-BBB	Fishing		M	AAA-BBB	Eng	AAA-BBB
Louis			ABC-DDD	M			ABC-DDD
Peter	DSA-SDF	Hunting	DSA-SDF	M	DSA-SDF	Eng	DSA-SDF

Any help would be greatly appreciated.

dataframe pyspark-dataframes

Solution 1:^[1]

You can do that with the following

from pyspark.sql import functions

df = (df.withColumn("id", functions.when(df["id1"].isNotNull(), df["id1"])
                                   .when(df["id2"].isNotNull(), df["id2"])
                                   .when(df["id3"].isNotNull(), df["id3"])))

df.show()
+-----+-------+-------+-------+------+-------+--------+-------+-------+
| name|    id1|  hobby|    id2|gender|    id3|language|   test|     id|
+-----+-------+-------+-------+------+-------+--------+-------+-------+
| Mike|AAA-BBB|Fishing|   null|     M|AAA-BBB|     Eng|AAA-BBB|AAA-BBB|
|Louis|   null|   null|ABC-DDD|     M|   null|    null|ABC-DDD|ABC-DDD|
|Peter|DSA-SDF|Hunting|DSA-SDF|     M|DSA-SDF|     Eng|DSA-SDF|DSA-SDF|
+-----+-------+-------+-------+------+-------+--------+-------+-------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	BoomBoxBoy

'Loop through columns in a dataframe and add a new column to the dataframe with the first non null value found. Using PySpark

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]