'Spark: groupBy Function when the input is a List from multiple key

I am new to Spark. I need to create an application that compared two different datasets (with different structures) and calculate their scores. I plan to broadcast the first dataset (Dataset A), the smaller one, and apply a map function for each data in Dataset B. As the result, I will have multiple Score objects for each Dataset A-B pair, so I'm thinking about collecting them in a List<Dataset A, Score> for each Dataset B. After that, I want to group the Score by Dataset A as the key across all documents, so at the end, I will have an object of Dataset A, List. I can't find a way to do this, as groupBy doesn't take a list as an input.

Score objectcontains Document B object and the integer score.
The result of the map function:
List<Dataset A, Score>
a1, Score
a2, Score
a3, Score

I want to group them by Dataset A as the key, so I will have:
a1, List of Score
a2, List of Score
a3, List of Score
List of Score contains the Score objects from all data of Dataset B.

apache-spark

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Spark: groupBy Function when the input is a List from multiple key

Sources

Related Questions