'Python/DataFrame: Count Unique Words in Each Column Cell (Not Counting Same Words in the Same Column Cell)
I thought this is going to be easy, but I'm having an issue finding an answer.
I want to count unique words in each column cell. If the same word repeats in the same cell, I want to count it only once.
i.e.)
1st: "I waited and waited and eventually left the hospital"
2nd: "I waited only 1 hour. My experience wasn't so bad"
What I want:
- waited: 2 ( even though there were 2 "waited"s in the first column cell, I want to count only once since it's the same, so total 2 - one from 1st, one from 2nd)
- hospital: 1
- experience:1 so on...
I tried this code
Reviews_Freq_Words=Reviews.ReviewText2.apply(lambda x: pd.value_counts(x.split(" "))).sum(axis = 0)
Any thoughts?
Solution 1:[1]
I came up with two different methods, performance-wise I'm not clear on which one is better but you can try them out for yourself.
Reviews_Freq_Words = Reviews.ReviewText2.apply(lambda x: pd.value_counts(list(set(x.split(" "))))).sum(axis = 0)
Reviews_Freq_Words = Reviews.ReviewText2.apply(lambda x: pd.value_counts(pd.unique(x.split()))).sum(axis = 0)
Solution 2:[2]
If I'm understanding correctly, does each column cell hold a sentence?
I'm new to pandas too so just tried it out. This worked for me:
import pandas as pd
data = ["I waited and waited and eventually left the hospital","I waited only 1 hour. My experience wasn't so bad"]
df = pd.DataFrame(data, columns=['sentences'])
result = df['sentences'].apply(lambda x: list(set(x.split(' ')))).explode().value_counts()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Aman Mohandas |
| Solution 2 | wk14 |
