'Resolving "Kryo serialization failed: Buffer overflow" Spark exception

I am trying to run Spark (Java) code and getting the error

org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 27".

Other posts have suggested setting the buffer to its max value. When I tried this with max buffer value of 512MB I got the error

java.lang.ClassNotFoundException: org.apache.spark.serializer.KryoSerializer.buffer.max', '512'

How can I solve this problem?



Solution 1:[1]

Try using "spark.kryoserializer.buffer.max.mb", "512" instead spark.kryoserializer.buffer.max", "512MB"

Solution 2:[2]

The property name is correct, spark.kryoserializer.buffer.max, the value should include the unit, so in your case is 512m.

Also, dependending where you are setting up the configuration you might have to write --conf spark.kryoserializer.buffer.max=512m. For instance, with a spark-submit or within the <spark-opts>...</spark-opts> of an Oozie worflow action.

Solution 3:[3]

either you can set this in spark configuration while creating spark session as

SparkSession.builder
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.kryoserializer.buffer.max", "512m")

or you can pass with your spark submit command as

spark-submit \
--verbose \
--name "JOB_NAME" \
--master MASTER_IP \
--conf "spark.kryoserializer.buffer.max=512m" \
main.py 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kishore
Solution 2 nessa.gp
Solution 3 MOHD NAYYAR