'Pandas Groupby: Count and mean combined
Working with pandas to try and summarise a data frame as a count of certain categories, as well as the means sentiment score for these categories.
There is a table full of strings that have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts.
My (simplified) data frame looks like this:
source text sent
--------------------------------
bar some string 0.13
foo alt string -0.8
bar another str 0.7
foo some text -0.2
foo more text -0.5
The output from this should be something like this:
source count mean_sent
-----------------------------
foo 3 -0.5
bar 2 0.415
The answer is somewhere along the lines of:
df['sent'].groupby(df['source']).mean()
Yet only gives each source and it's mean, with no column headers.
Solution 1:[1]
In newer versions of pandas you don't need the rename anymore, just use named aggregation:
df = df.groupby('source') \
.agg(count=('text', 'size'), mean_sent=('sent', 'mean')) \
.reset_index()
print (df)
source count mean_sent
0 bar 2 0.415
1 foo 3 -0.500
Solution 2:[2]
Below one should work fine:
df[['source','sent']].groupby('source').agg(['count','mean'])
Solution 3:[3]
For those who were looking for aggregations for more than two columns (as I were): just add those to 'agg'.
df = df.groupby(['id']).agg({'texts': 'size', 'char_num': 'mean', 'bytes': 'mean'}).reset_index()
Solution 4:[4]
A shorter version to achieve this is:
df.groupby('source')['sent'].agg(count='size', mean_sent='mean').reset_index()
The nice thing about this is that you can extend it if you want to take the mean of multiple variables but only count once. In this case you will have to pass a dictionary:
df.groupby('source')['sent1', 'sent2'].agg({'count': 'size', 'means': 'mean'}).reset_index()
Solution 5:[5]
I think this should provide the output that you wanted:
result = pd.DataFrame(df.groupby('source').size())
results['mean_score'] = df.groupby('source').sent.mean()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Henry Ecker |
| Solution 2 | h4z3 |
| Solution 3 | João |
| Solution 4 | Henry Ecker |
| Solution 5 | Henry Ecker |
