'Sort by key in map type column for each row in spark dataframe
I have a spark dataframe in the below format:
Name LD_Value
A37 Map(10 -> 0.20,5 -> 0.30,17 -> 0.25)
A39 Map(11 -> 0.40,6 -> 0.67,24 -> 0.45)
I need to sort based on keys in LD_Value column for each record in descending order.
Expected output:
Name LD_Value
A37 Map(17 -> 0.25,10 -> 0.20,5 -> 0.30)
A39 Map(24 -> 0.45,11 -> 0.40,6 -> 0.67)
Is it possible to do sorting on map type column in spark dataframe?
I looked into spark higher-order functions but no luck.
Solution 1:[1]
You can first get the keys of the map using map_keys function, sort the array of keys then use transform to get the corresponding value for each key element from the original map, and finally update the map column by creating a new map from the two arrays using map_from_arrays function.
For Spark 3+, you can sort the array of keys in descending order by using a comparator function as the second argument to array_sort function :
from pyspark.sql import functions as F
df1 = df.withColumn(
"LD_Value_keys",
F.expr("array_sort(map_keys(LD_Value), (x, y) -> case when x > y then -1 when x < y then 1 else 0 end)")
).withColumn("LD_Value_values", F.expr("transform(LD_Value_keys, x -> LD_Value[x])")) \
.withColumn("LD_Value", F.map_from_arrays(F.col("LD_Value_keys"), F.col("LD_Value_values"))) \
.drop("LD_Value_keys", "LD_Value_values")
df1.show()
#+----+----------------------------------+
#|Name|LD_Value |
#+----+----------------------------------+
#|A37 |[17 -> 0.25, 10 -> 0.2, 5 -> 0.3] |
#|A39 |[24 -> 0.45, 11 -> 0.4, 6 -> 0.67]|
#+----+----------------------------------+
For Spark < 3, you can sort an array in descending order using this UDF:
# array_sort_udf (array, reverse): if reverse = True then desc
array_sort_udf = F.udf(lambda arr, r: sorted(arr, reverse=r), ArrayType(StringType()))
And use it like this:
df.withColumn("LD_Value_keys", array_sort_udf(F.map_keys(F.col("LD_Value")), F.lit(True)))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
