'Calculating percentages for multiple columns
I have a dataset with the following structure
| index | candidato | Page Name | Post Created Date | Total Interactions | Likes | Shares | Comments | Love | Angry |
|---|---|---|---|---|---|---|---|---|---|
| 0 | António Costa | Observador | 2022-01-03 | 4500 | 340 | 400 | 433 | 545 | 565 |
There are 9 different candidato (candidates) and 27 different Page Name
Full dataset can be found here
I need to find a way to calculate, for each Page Name, the totals and the percentage of Total Interactions, Likes, Shares, Comments, Love, and Angry
that will result in a DataFrame with the following structure
| candidato | Page Name | Total Interactions | Total Interactions Percentage | Total Likes | Total Likes Percentage | Other Columns | Other Columns Percentage |
|---|---|---|---|---|---|---|---|
| António Costa | Observador | 6500 | 34 | 23 | 1% | 540 | 23% |
| Rui Rio | Observador | 4500 | 23 | value | percentage | value | percentage |
The reason why I need to calculate this is in order to produce a percent stacked bar chart such as this one:

What is the best way to achieve this with Pandas? Thank you in advance for your help.
Disclosure This question is to help in a non-for-profit project that analyzes media behaviour, and bias, towards Portuguese candidates to the 2022 general elections. The prior report was made using Google Sheets but analyzing the datasets with Python is the best way, since I plan on doing this every 3 months.
The GitHub repo can be found here, where you can access all datasets and code used.
Solution 1:[1]
After getting the data via:
!wget https://raw.githubusercontent.com/JorgeMiguelGomes/LEG2022_MediaMonitor/main/legislativas_2022_media_monitor_29jan2022/data_products/legislativas_2022_all_candidates_filtered.csv
and reading in csv to dataframe:
df = pd.read_csv("legislativas_2022_all_candidates_filtered.csv")
df = df.drop(columns=["Unnamed: 0"])
You can groupby Page Name and sum the values in the columns you mentioned:
col_list = ["Total Interactions","Likes","Shares","Comments","Love","Angry"]
totals = df.groupby("Page Name")[col_list].sum()
The first 3 lines will look like:
Total Interactions Likes Shares Comments Love Angry
Page Name
AgĂȘncia Lusa 4048 1464 285 1011 79 127
CM TV 21813 7934 1050 7824 401 627
CNN Portugal 216437 59776 8755 107879 3640 5198
To get percentages, you can do:
percentages = totals/df[col_list].sum()*100
Sample from result:
Total Interactions Likes ... Love Angry
Page Name ...
AgĂȘncia Lusa 0.324301 0.403282 ... 0.408945 0.371085
CM TV 1.747521 2.185548 ... 2.075784 1.832048
CNN Portugal 17.339582 16.466265 ... 18.842530 15.188172
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | zabop |
