'Sort by key in map type column for each row in spark dataframe

I have a spark dataframe in the below format:

Name   LD_Value
A37    Map(10 -> 0.20,5 -> 0.30,17 -> 0.25)
A39    Map(11 -> 0.40,6 -> 0.67,24 -> 0.45)

I need to sort based on keys in LD_Value column for each record in descending order.

Expected output:

Name   LD_Value
A37    Map(17 -> 0.25,10 -> 0.20,5 -> 0.30)
A39    Map(24 -> 0.45,11 -> 0.40,6 -> 0.67)

Is it possible to do sorting on map type column in spark dataframe?

I looked into spark higher-order functions but no luck.



Solution 1:[1]

You can first get the keys of the map using map_keys function, sort the array of keys then use transform to get the corresponding value for each key element from the original map, and finally update the map column by creating a new map from the two arrays using map_from_arrays function.

For Spark 3+, you can sort the array of keys in descending order by using a comparator function as the second argument to array_sort function :

from pyspark.sql import functions as F

df1 = df.withColumn(
    "LD_Value_keys",
    F.expr("array_sort(map_keys(LD_Value), (x, y) -> case when x > y then -1 when x < y then 1 else 0 end)")
).withColumn("LD_Value_values", F.expr("transform(LD_Value_keys, x -> LD_Value[x])")) \
 .withColumn("LD_Value", F.map_from_arrays(F.col("LD_Value_keys"), F.col("LD_Value_values"))) \
 .drop("LD_Value_keys", "LD_Value_values")

df1.show() 

#+----+----------------------------------+
#|Name|LD_Value                          |
#+----+----------------------------------+
#|A37 |[17 -> 0.25, 10 -> 0.2, 5 -> 0.3] |
#|A39 |[24 -> 0.45, 11 -> 0.4, 6 -> 0.67]|
#+----+----------------------------------+

For Spark < 3, you can sort an array in descending order using this UDF:

# array_sort_udf (array, reverse): if reverse = True then desc
array_sort_udf = F.udf(lambda arr, r: sorted(arr, reverse=r), ArrayType(StringType()))

And use it like this:

df.withColumn("LD_Value_keys", array_sort_udf(F.map_keys(F.col("LD_Value")), F.lit(True)))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1