'How can I concatenate all values in descending order that have the same primary key in HIVE?
I am using HIVE and I have a table like this:
S.no | ID | applicant_num | f_name | l_name | Primary Key |
---|---|---|---|---|---|
1 | 123 | 202201A1 | akhil | yadav | 123~&~akhil~&~yadav |
2 | 123 | 202202A2 | akhil | yadav | 123~&~akhil~&~yadav |
3 | 123 | 202203A2 | akhil | yadav | 123~&~akhil~&~yadav |
4 | 987 | 202201B8 | ankit | yadav | 987~&~ankit~&~yadav |
5 | 987 | 202203B19 | ankit | yadav | 987~&~ankit~&~yadav |
6 | 987 | 202208B34 | ankit | yadav | 987~&~ankit~&~yadav |
I want to concatenate all the applicant_num for same primary_key into a string such that they maintain their descending order (Last application should come first).
Here is what I tried:
select
concat_ws('~', collect_set(applicant_num)), primary_key
from
(select *
from table
order by applicant_num desc)
group by
primary_key
Using this query I am getting a result like this:
applicant_num | Primary Key |
---|---|
202203A2~202201A1~202202A2 | 123~&~akhil~&~yadav |
202203B19~202208B34~202201B8 | 987~&~ankit~&~yadav |
whereas what I want is
applicant_num | Primary Key |
---|---|
202203A2~202202A2~202201A1 | 123~&~akhil~&~yadav |
202208B34~202203B19~202201B8 | 987~&~ankit~&~yadav |
The concatenation is not maintaining the descending order.
I tried
select
concat_ws('~', sort_array(collect_set(applicant_num))), primary_key
from
(select *
from table
order by applicant_num desc)
group by
primary_key
But sort_array
returns array in ascending order and it cannot be sorted in descending.
Is there any to concatenate all applicant_num in one string for a primary key while maintaining the descending order of applicant_num?
Solution 1:[1]
PySpark API of array_sort
always sorting in ascending order (shame).
Luckily, Spark SQL API of array_sort
does support it (sort of). So despite the unnecessary lengthy expression, this is what you're looking for
from pyspark.sql import functions as F
df = spark.createDataFrame([
(1, '1x'),
(1, '1a'),
(1, '1b'),
(2, '2a'),
(2, '2b'),
(2, '2c'),
], ['id', 'col'])
(df
.groupBy('id')
.agg(
F.concat_ws('~', F.expr('array_sort(collect_list(col), (left, right) -> case when left < right then 1 when left > right then -1 else 0 end)')).alias('col')
)
.show()
)
+---+--------+
| id| col|
+---+--------+
| 1|1x~1b~1a|
| 2|2c~2b~2a|
+---+--------+
Solution 2:[2]
sort_array
can also be sorted in descending
order, Please refer to here
select
concat_ws('~', sort_array(collect_set(applicant_num), false)), primary_key
from
(select *
from table
order by applicant_num desc)
group by primary_key
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | pltc |
Solution 2 | 过过招 |