'Group by then sum of multiple columns in Scala Spark

I have a DataFrame with hundreds of feature columns, like this:

Country | ID   | Feature_1 | Feature_2 | Feature_3 |....
US        123       1            5          0
US        456       0            10         1
CA        789       0            6          1
CA        999       0            3          0
...

I want to perform a group by on Country, then take the sum per feature, so I should end up with something like this:

Country | Feature_1 | Feature_2 | Feature_3 | .... 
US             1          15         1 
CA             0           9         1

How can I efficiently compute the aggregate sum function for all of my hundreds of features? I know for one feature, it's like this:

df.groupBy("Country").sum("Feature_1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Group by then sum of multiple columns in Scala Spark

Sources

Related Questions