'Delete words in String based on array of strings in other column pyspark
I wish to delete certain words from a string that occur in an array on the same row of a DataFrame. The words that should be deleted are different for every row.
For example from the text:
'I think that this is going to work',
the words in the following array should be deleted: ['going', 'work'].
outcome: 'I think that this is to'
I have already tried the following function (which I later transform into and udf function):
def remove_symbols(text,array):
text = " ".join([word for word in str(text).split() if word not in array])
return(text)
Where text denotes the original text and array the words that should be deleted. However, this only deleted the first word in the array.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
