'Spark Scala - Split DataFrame column into multiple depending on the size of the column
I need to split a column in several columns depending on the number of fields that each record has, for example, if I have the following DF:
+---+-------------------------------------------------+---+
|...|unique_code |...|
+---+-------------------------------------------------+---+
|...|2022-12-31 00:00:00.000000000*_*AAAAA*_*000000000|...|
+---+-------------------------------------------------+---+
|...|2022-12-31 00:00:00.000000000*_*BBBB |...|
+---+-------------------------------------------------+---+
|...|2022-12-31 00:00:00.000000000*_*CCC*_*1111*_*XX |...|
+---+-------------------------------------------------+---+
I know that at most it is going to have 4 fields and at least 1, being always in the same order, which is the one in this list:
val uniqueCodeFields = List("col1", "col2", "col3", "col4")
Therefore the resulting DF would be the following:
+---+-----------------------------+-----+---------+----+---+
|...|col1 |col2 |col3 |col4|...|
+---+-----------------------------+-----+---------+----+---+
|...|2022-12-31 00:00:00.000000000|AAAAA|000000000|NULL| |
+---+-----------------------------+-----+---------+--- +---+
|...|2022-12-31 00:00:00.000000000|BBBB |NULL |NULL| |
+---+-----------------------------+-----+---------+--- +---+
|...|2022-12-31 00:00:00.000000000|CCC |1111 |XX |...|
+---+-----------------------------+-----+---------+----+---+
I developed this, based on https://stackoverflow.com/a/45972636/9025222
chgPivotedDF.withColumn("temp", split(col("unique_code"), "\\*_\\*")).select(
(0 until size(col("temp"))).map(i => col("temp").getItem(i).as(uniqueCodeFields(i))): _*
)
But I am not being able to get the length of the "temp" column so as to only loop through the column to its limit in each case, getting the following error:
error: type mismatch;
found : org.apache.spark.sql.Column
required: Int
(0 until col($"temp")).map(i => col("temp").getItem(i).as(uniqueCodeFields(i))): _*
^
any help is welcome, thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|