'Selecting the max columns after group by in SQL

I'm using Spark-SQL in Databricks. I have a dataset like this (but with way more features):

Country | ID | feature_1 | feature_2 | feature_3 | feature_4
US       123     100         5          40           60
US       456     200         30          9           70
CA       789     50          12         45           90
CA       999     250         180        17           40

I'm interested in knowing which features are the top 3 (i.e. max values) for each country, id grouping, like this:

Country |  ID   |    max_1   |   max_2   |   max_3
US         124     feature_1    feature_4   feature_3
US         456     feature_1    feature_4   feature_2
CA         789     feature_4    feature_1   feature_3
CA         999     feature_1    feature_2   feature_4

I was thinking of using a row_number() window function, but not quite sure how to rank across columns to capture the top 3. Anyone have any ideas? Would I need to unpivot the data to do this? (I also need to make this dynamic since I have tons of features)

sql apache-spark-sql

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Selecting the max columns after group by in SQL

Sources

Related Questions