'How to do Spark left outer join efficiently with skewed data (spark 2.3)
I have two datasets DS_A(300Gb csv) and DS_B(50 GB csv). Currently we are doing a left outer join on the DF
val joinedDS = DS_A.joinWith(DS_B, DS_A("value") === DS_B("value"), "left_outer")
the data is skewed in DS_B and the takes forever to run
I tried to salting technique but facing too many OOM exceptions
Is the any better solution, i cant move to Spark3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

