'Pyspark Delete rows in table one which matches rows in table two
Problem Statement
Here's my use case:
I have 2 tables, Today_data and Yesterday_data, for example:
Today_data:
Id Value
1 1_data
2 2_data
3 3_data
Yesterday_data:
Id Value
2 2_data
4 4_data
8 8_data
I want to delete Today_data df rows if the row matches Yesterday_data row.
Expected Result
Id Value
1 1_data
3 3_data
Approach Taken
I was thinking it should be a easy left join where Today_data will be on the left, however after I read through all the join operations in pyspark here: https://sparkbyexamples.com/pyspark/pyspark-join-explained-with-examples/#pyspark-join-types, I don't see any of them can solve my problem. Any ideas?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
