'What Hashing algorithm do Spark use to groupByKey binary column?

My ETL pipeline using Spark 2.4.3 needs to group the dataset based on a BINARY column and a Long column, represented as Array[Byte] in Scala. The space of that BINARY column is about 2^128 or 16 bytes. I am using the groupByKey API provided by Spark Dataset

val grouped = data.groupByKey(
        data => (data.binary_column, deviceData.long_column)
      )

Now I don't know how does Spark ensures that there won't be any collision. Specifically, I need to know what Hashing algorithm Spark uses to prevent collisions.

apache-spark apache-spark-sql

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'What Hashing algorithm do Spark use to groupByKey binary column?

Sources

Related Questions