'How can I concatenate all values in descending order that have the same primary key in HIVE?

I am using HIVE and I have a table like this:

S.no ID applicant_num f_name l_name Primary Key
1 123 202201A1 akhil yadav 123~&~akhil~&~yadav
2 123 202202A2 akhil yadav 123~&~akhil~&~yadav
3 123 202203A2 akhil yadav 123~&~akhil~&~yadav
4 987 202201B8 ankit yadav 987~&~ankit~&~yadav
5 987 202203B19 ankit yadav 987~&~ankit~&~yadav
6 987 202208B34 ankit yadav 987~&~ankit~&~yadav

I want to concatenate all the applicant_num for same primary_key into a string such that they maintain their descending order (Last application should come first).

Here is what I tried:

select 
    concat_ws('~', collect_set(applicant_num)), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by 
    primary_key  

Using this query I am getting a result like this:

applicant_num Primary Key
202203A2~202201A1~202202A2 123~&~akhil~&~yadav
202203B19~202208B34~202201B8 987~&~ankit~&~yadav

whereas what I want is

applicant_num Primary Key
202203A2~202202A2~202201A1 123~&~akhil~&~yadav
202208B34~202203B19~202201B8 987~&~ankit~&~yadav

The concatenation is not maintaining the descending order.

I tried

select 
    concat_ws('~', sort_array(collect_set(applicant_num))), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by 
    primary_key  

But sort_array returns array in ascending order and it cannot be sorted in descending.

Is there any to concatenate all applicant_num in one string for a primary key while maintaining the descending order of applicant_num?



Solution 1:[1]

PySpark API of array_sort always sorting in ascending order (shame).

Luckily, Spark SQL API of array_sort does support it (sort of). So despite the unnecessary lengthy expression, this is what you're looking for

from pyspark.sql import functions as F

df = spark.createDataFrame([
    (1, '1x'),
    (1, '1a'),
    (1, '1b'),
    (2, '2a'),
    (2, '2b'),
    (2, '2c'),
], ['id', 'col'])

(df
    .groupBy('id')
    .agg(
        F.concat_ws('~', F.expr('array_sort(collect_list(col), (left, right) -> case when left < right then 1 when left > right then -1 else 0 end)')).alias('col')
    )
    .show()
)

+---+--------+
| id|     col|
+---+--------+
|  1|1x~1b~1a|
|  2|2c~2b~2a|
+---+--------+

Solution 2:[2]

sort_array can also be sorted in descending order, Please refer to here

select 
    concat_ws('~', sort_array(collect_set(applicant_num), false)), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by primary_key  

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 pltc
Solution 2 过过招