'Shuffle Stage Failing Due To Executor Loss

I get the following error when my spark jobs fails **"org.apache.spark.shuffle.FetchFailedException: The relative remote executor(Id: 21), which maintains the block data to fetch is dead."**

Over view of my spark job

input size is ~35 GB

I have broadcast joined all the smaller tables with the mother table into say a dataframe1 and then i salted each big table and dataframe1 before i join it with dataframe1 (left table).

profile used:

@configure(profile=[
     'EXECUTOR_MEMORY_LARGE',
     'NUM_EXECUTORS_32',
     'DRIVER_MEMORY_LARGE',
     'SHUFFLE_PARTITIONS_LARGE'
])

using the above approach and profiles i was able to get the runtime down by 50% but i still get Shuffle Stage Failing Due To Executor Loss issues.

is there a way i can fix this?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source