'regex_replace on string for string match and not substring match
This:
words = words.withColumn('value_2', F.regexp_replace('value', '|'.join(stopWords), ''))
works fine for substrings.
However, I have a stop word 'a' and as a result 'was' becomes 'ws'. I only want to see it on 'A' or 'a', and leave was as is.
Solution 1:[1]
Place word boundaries around the alternation:
words = words.withColumn('value_2', F.regexp_replace('value', '\\b(' + '|'.join(stopWords) + ')\\b', ''))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tim Biegeleisen |
