'What Hashing algorithm do Spark use to groupByKey binary column?
My ETL pipeline using Spark 2.4.3 needs to group the dataset based on a BINARY column and a Long column, represented as Array[Byte] in Scala. The space of that BINARY column is about 2^128 or 16 bytes. I am using the groupByKey API provided by Spark Dataset
val grouped = data.groupByKey(
data => (data.binary_column, deviceData.long_column)
)
Now I don't know how does Spark ensures that there won't be any collision. Specifically, I need to know what Hashing algorithm Spark uses to prevent collisions.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
