'Transformations on CompactBuffer in spark
Lets say I have a pair RDD as follows:
(John,10)
(John,9)
(Rachel,5)
(Rachel,6)
(Rachel,8)
Now if I run groupByKey( ) we get the following result
(John,CompactBuffer(10,9))
(Rachel,CompactBuffer(5,6,8))
How do I transform this to a pair RDD of String,(Int,Int) as follows
(John,(19,2)) { 1st entry is sum and 2nd entry is count }
(Rachel,(19,3))
I know this can be possible with other methods without groupByKey but I want to know how to work with groupByKey and CompactBuffer
Solution 1:[1]
Treat CompactBuffer as normal ArrayBuffer
On result rdd after groupByKey use aggregate .agg(sum("value") as "valueSum",count())
Refer Stackoverflow question about CompactBuffer
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | devilpreet |
