'How can I delete stopwords from a column in a df?

I've been trying to delete the stopwords from a column in a df, but I'm having trouble doing it.

discografia["SSW"] = [word for word in discografia.CANCIONES if not word in stopwords.words('spanish')]

But in the new column I just get the same words as in the column "CANCIONES". What am I doing wrong? Thanks!



Solution 1:[1]

We can use explode in conjunction with grouping by the original index to assign back to the original DataFrame.

stopwords = ["buzz"]
df = pd.DataFrame({"CANCIONES": [["fizz", "buzz", "foo"], ["baz", "buzz"]]})

words = r".|".join(stopwords)

exploded = df.explode("CANCIONES")
print(exploded)

  CANCIONES
0      fizz
0      buzz
0       foo
1       baz
1      buzz

df["SSW"] = exploded.loc[~exploded.CANCIONES.str.contains(words)].reset_index().groupby(
    "index", as_index=False
).agg({"CANCIONES": list}).CANCIONES

print(df)
           CANCIONES          SSW
0  [fizz, buzz, foo]  [fizz, foo]
1        [baz, buzz]        [baz]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 gold_cy