'How to do Spark left outer join efficiently with skewed data (spark 2.3)

I have two datasets DS_A(300Gb csv) and DS_B(50 GB csv). Currently we are doing a left outer join on the DF

val joinedDS = DS_A.joinWith(DS_B, DS_A("value") === DS_B("value"), "left_outer") the data is skewed in DS_B and the takes forever to run I tried to salting technique but facing too many OOM exceptions Is the any better solution, i cant move to Spark3

enter image description here



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source