'Count symbols/punctuation in tweets
From a pandas dataframe, I would need to count punctuations by sentiment. The data is
Tweet Sentiment
Once upon a time, in the middle of nowhere, ... ! 0
What are you f*** do? -1
It's a lovely day!! :) 1
My desired output would be
Tweet Sentiment Punctuation_count
Once upon a time, in the middle of nowhere, ... ! 0 6
What are you f*** do? -1 4
It's a lovely day!! :) 1 5
If I wanted to remove punctuation, I would used:
df["Punctuation"] = df['Tweet'].str.replace('[^\w\s]','')
But what I would like to do is count the punctuation in each Tweet.
Solution 1:[1]
One option is to simply count the number of times elements in each string appear in string.punctuation:
import string
df['Punctuation_count'] = df['Tweet'].apply(lambda x: sum(el in string.punctuation for el in x))
Output:
Tweet Sentiment Punctuation_count
0 Once upon a time, in the middle of nowhere, ... ! 0 6
1 What are you f*** do? -1 4
2 It's a lovely day!! :) 1 5
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
