'How to do if else condition on Pyspark columns with regex?

I have a pyspark dataframe

event_name
0 a-markets-l1
1 a-markets-watch
2 a-markets-buy
3 a-markets-z2
4 scroll_down

This dataframe has event_name column

EXCLUDE_list = ["a-markets-buy", "a-markets-watch"]
expr = "a-markets"

new_df = df.withColumn("event_name",
                           when(
                               (col('event_name').rlike(expr)
                                & ~(col('event_name').isin(EXCLUDE_list)),'a-markets'))

I am trying to only filter out and replace those values which has "a-markets" and not in the EXCLUDE_list list by "a-markets"



Solution 1:[1]

new_df = df.withColumn("event_name",
                           when(
                               (col('event_name').rlike(expr)
                                & ~(col('event_name').isin(EXCLUDE_list)),'a-markets').otherwise(col("event_name"))

Putting a otherwise did the trick

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Shubh