'Spark: groupBy Function when the input is a List from multiple key

I am new to Spark. I need to create an application that compared two different datasets (with different structures) and calculate their scores. I plan to broadcast the first dataset (Dataset A), the smaller one, and apply a map function for each data in Dataset B. As the result, I will have multiple Score objects for each Dataset A-B pair, so I'm thinking about collecting them in a List<Dataset A, Score> for each Dataset B. After that, I want to group the Score by Dataset A as the key across all documents, so at the end, I will have an object of Dataset A, List. I can't find a way to do this, as groupBy doesn't take a list as an input.

Score objectcontains Document B object and the integer score.
The result of the map function:
List<Dataset A, Score>
a1, Score
a2, Score
a3, Score

I want to group them by Dataset A as the key, so I will have:
a1, List of Score
a2, List of Score
a3, List of Score
List of Score contains the Score objects from all data of Dataset B.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source