'How can I delete stopwords from a column in a df?
I've been trying to delete the stopwords from a column in a df, but I'm having trouble doing it.
discografia["SSW"] = [word for word in discografia.CANCIONES if not word in stopwords.words('spanish')]
But in the new column I just get the same words as in the column "CANCIONES". What am I doing wrong? Thanks!
Solution 1:[1]
We can use explode in conjunction with grouping by the original index to assign back to the original DataFrame.
stopwords = ["buzz"]
df = pd.DataFrame({"CANCIONES": [["fizz", "buzz", "foo"], ["baz", "buzz"]]})
words = r".|".join(stopwords)
exploded = df.explode("CANCIONES")
print(exploded)
CANCIONES
0 fizz
0 buzz
0 foo
1 baz
1 buzz
df["SSW"] = exploded.loc[~exploded.CANCIONES.str.contains(words)].reset_index().groupby(
"index", as_index=False
).agg({"CANCIONES": list}).CANCIONES
print(df)
CANCIONES SSW
0 [fizz, buzz, foo] [fizz, foo]
1 [baz, buzz] [baz]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | gold_cy |
