'Group by then sum of multiple columns in Scala Spark
I have a DataFrame with hundreds of feature columns, like this:
Country | ID | Feature_1 | Feature_2 | Feature_3 |....
US 123 1 5 0
US 456 0 10 1
CA 789 0 6 1
CA 999 0 3 0
...
I want to perform a group by on Country, then take the sum per feature, so I should end up with something like this:
Country | Feature_1 | Feature_2 | Feature_3 | ....
US 1 15 1
CA 0 9 1
How can I efficiently compute the aggregate sum function for all of my hundreds of features? I know for one feature, it's like this:
df.groupBy("Country").sum("Feature_1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|