'Transformations on CompactBuffer in spark

Lets say I have a pair RDD as follows:

(John,10)
(John,9)
(Rachel,5)
(Rachel,6)
(Rachel,8)

Now if I run groupByKey( ) we get the following result

(John,CompactBuffer(10,9)) 
(Rachel,CompactBuffer(5,6,8))

How do I transform this to a pair RDD of String,(Int,Int) as follows

(John,(19,2))   { 1st entry is sum and 2nd entry is count }
(Rachel,(19,3))

I know this can be possible with other methods without groupByKey but I want to know how to work with groupByKey and CompactBuffer



Solution 1:[1]

Treat CompactBuffer as normal ArrayBuffer

On result rdd after groupByKey use aggregate .agg(sum("value") as "valueSum",count())

Refer Stackoverflow question about CompactBuffer

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 devilpreet