'pyspark sql with inline case when join condition
This is the original SQL with inline case when condition:
select *
from table_a
LEFT JOIN table_b
ON case when table_a.key not in ('1','2') then '0' else table_a.key end = table_b.key
What is the equivalent pyspark code?
I was trying when().otherwise() function and if() function but neither worked.
table_a=spark.sql('''select 1 as key union select 2 as key''')
table_a.show()
table_b=spark.sql('''select 3 as key union select 0 as key''')
table_b.show()
val join_condition = when(((table_a.key == '1') | (table_a.key == '2')), table_a.key == table_b.key).otherwise(('0' == table_b.key))
df = table_a.join(table_b, join_condition, 'leftouter').select(table_a['*'], table_b['*'])
df = table_a.join(table_b, ((if(((table_a.key == '1') | (table_a.key == '2')), ('0'), (table_a.key))) == table_b.key), 'leftouter')\
.select(table_a['*'], table_b['*'])
Thank you!
Solution 1:[1]
Whenever you have a multi conditions like this (table_a.key == '1' | table_a.key == '2'), you'd have to wrap each of them separately like this ((table_a.key == '1') | (table_a.key == '2'))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | pltc |
