'Check if PySaprk column values exists in another dataframe column values

I'm trying to figure out the condition to check if the values of one PySpark dataframe exist in another PySpark dataframe, and if so extract the value and compare again. I was thinking of doing a multiple withColumn() with a when() function.

For example my two dataframes can be something like:

df1
| id    | value |
| ----- | ----  |
| hello | 1111  |
| world | 2222  |

df2
| id     | value |
| ------ | ----  |
| hello  | 1111  |
| world  | 3333  |
| people | 2222  |

And the result I wish to obtain is to check first if the value of df1.id exists in df2.id and if true return me the df2.value, for example I was trying something like:

df1 = df1.withColumn("df2_value", when(df1.id == df2.id, df2.value))

So I get something like:

df1
| id    | value | df2_value |
| ----- | ----  | --------- |
| hello | 1111  | 1111      |
| world | 2222  | 3333      |

So that now I can do another check between these two value columns in the df1 dataframe, and return a boolean column (1or 0) in a new dataframe.

The result I wish to get would be something like:

df3
| id    | value | df2_value | match |
| ----- | ----  | --------- | ----- |
| hello | 1111  | 1111      | 1     |
| world | 2222  | 3333      | 0     |


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source