'How to group one column and rank values by grouped totals of another column
Situation
I have a dataframe of tweets pulled from the Twitter API. Each tweet has an author_id parameter, and a retweets and an engagements parameter.
I'd like to group the tweets by author_id, as some authors have posted more than one tweet, and then rank them by the total number of engagements from the sum of each author's tweets.
Example Dataframe
| index | text | author_id | retweets | engagement |
|---|---|---|---|---|
| 2022-03-20 | Max Muncy is great! | 100 | 3 | 2 |
| 2022-03-21 | The worst Dodger is Max muncy. | 101 | 1 | 8 |
| 2022-03-24 | Max Muncy was great, but not so much now. | 101 | 1 | 4 |
| 2022-03-28 | What a fantastic guy, that Max muncy. | 102 | 4 | 5 |
| 2022-03-31 | Max Muncy, my vote for MVP. | 103 | 8 | 17 |
What I tried
df01a.groupby(['author_id']).sum()
That returned this:
| author_id | text | retweets | engagement |
|---|---|---|---|
| 100 | Max Muncy is great! | 3 | 2 |
| 101 | The worst Dodger is Max muncy.Max Muncy was great, but not so much now. | 11 | 84 |
| 102 | What a fantastic guy, that Max muncy. | 4 | 5 |
| 103 | Max Muncy, my vote for MVP. | 8 | 17 |
Assessment
The first thing I notice is that the retweet and engagement figures are concatenating instead of summing, and I've not been able to sort out why, as the sum method seems pretty straight-forward.
Thank you for any input on what I might be doing wrong here.
UPDATE - 4/5
Just noticed the note on datatypes so posting that here:
text object
author_id object
retweets object
engagement object
dtype: object
Solution 1:[1]
First make sure the engagement column is numerical and not a string:
df['engagement'] = pd.to_numeric(df['engagement'])
From what you are saying you do not need to use groupby here. Simply sort by author_id and engagment.
df.sort_values(['author_id', 'engagement'], ascending=[True, True])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | le_camerone |
