'Selecting the max columns after group by in SQL
I'm using Spark-SQL in Databricks. I have a dataset like this (but with way more features):
Country | ID | feature_1 | feature_2 | feature_3 | feature_4
US 123 100 5 40 60
US 456 200 30 9 70
CA 789 50 12 45 90
CA 999 250 180 17 40
I'm interested in knowing which features are the top 3 (i.e. max values) for each country, id grouping, like this:
Country | ID | max_1 | max_2 | max_3
US 124 feature_1 feature_4 feature_3
US 456 feature_1 feature_4 feature_2
CA 789 feature_4 feature_1 feature_3
CA 999 feature_1 feature_2 feature_4
I was thinking of using a row_number() window function, but not quite sure how to rank across columns to capture the top 3. Anyone have any ideas? Would I need to unpivot the data to do this? (I also need to make this dynamic since I have tons of features)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
