'How to do if else condition on Pyspark columns with regex?
I have a pyspark dataframe
| event_name | |
|---|---|
| 0 | a-markets-l1 |
| 1 | a-markets-watch |
| 2 | a-markets-buy |
| 3 | a-markets-z2 |
| 4 | scroll_down |
This dataframe has event_name column
EXCLUDE_list = ["a-markets-buy", "a-markets-watch"]
expr = "a-markets"
new_df = df.withColumn("event_name",
when(
(col('event_name').rlike(expr)
& ~(col('event_name').isin(EXCLUDE_list)),'a-markets'))
I am trying to only filter out and replace those values which has "a-markets" and not in the EXCLUDE_list list by "a-markets"
Solution 1:[1]
new_df = df.withColumn("event_name",
when(
(col('event_name').rlike(expr)
& ~(col('event_name').isin(EXCLUDE_list)),'a-markets').otherwise(col("event_name"))
Putting a otherwise did the trick
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Shubh |
