'How can I concatenate all values in descending order that have the same primary key in HIVE?

I am using HIVE and I have a table like this:

S.no	ID	applicant_num	f_name	l_name	Primary Key
1	123	202201A1	akhil	yadav	123~&~akhil~&~yadav
2	123	202202A2	akhil	yadav	123~&~akhil~&~yadav
3	123	202203A2	akhil	yadav	123~&~akhil~&~yadav
4	987	202201B8	ankit	yadav	987~&~ankit~&~yadav
5	987	202203B19	ankit	yadav	987~&~ankit~&~yadav
6	987	202208B34	ankit	yadav	987~&~ankit~&~yadav

I want to concatenate all the applicant_num for same primary_key into a string such that they maintain their descending order (Last application should come first).

Here is what I tried:

select 
    concat_ws('~', collect_set(applicant_num)), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by 
    primary_key

Using this query I am getting a result like this:

applicant_num	Primary Key
202203A2~202201A1~202202A2	123~&~akhil~&~yadav
202203B19~202208B34~202201B8	987~&~ankit~&~yadav

whereas what I want is

applicant_num	Primary Key
202203A2~202202A2~202201A1	123~&~akhil~&~yadav
202208B34~202203B19~202201B8	987~&~ankit~&~yadav

The concatenation is not maintaining the descending order.

I tried

select 
    concat_ws('~', sort_array(collect_set(applicant_num))), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by 
    primary_key

But sort_array returns array in ascending order and it cannot be sorted in descending.

Is there any to concatenate all applicant_num in one string for a primary key while maintaining the descending order of applicant_num?

Solution 1:^[1]

PySpark API of array_sort always sorting in ascending order (shame).

Luckily, Spark SQL API of array_sort does support it (sort of). So despite the unnecessary lengthy expression, this is what you're looking for

from pyspark.sql import functions as F

df = spark.createDataFrame([
    (1, '1x'),
    (1, '1a'),
    (1, '1b'),
    (2, '2a'),
    (2, '2b'),
    (2, '2c'),
], ['id', 'col'])

(df
    .groupBy('id')
    .agg(
        F.concat_ws('~', F.expr('array_sort(collect_list(col), (left, right) -> case when left < right then 1 when left > right then -1 else 0 end)')).alias('col')
    )
    .show()
)

+---+--------+
| id|     col|
+---+--------+
|  1|1x~1b~1a|
|  2|2c~2b~2a|
+---+--------+

Solution 2:^[2]

sort_array can also be sorted in descending order, Please refer to here

select 
    concat_ws('~', sort_array(collect_set(applicant_num), false)), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by primary_key

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	pltc
Solution 2	è¿‡è¿‡æ‹›

'How can I concatenate all values in descending order that have the same primary key in HIVE?

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]