'Calculating percentages for multiple columns

I have a dataset with the following structure

index candidato Page Name Post Created Date Total Interactions Likes Shares Comments Love Angry
0 António Costa Observador 2022-01-03 4500 340 400 433 545 565

There are 9 different candidato (candidates) and 27 different Page Name

Full dataset can be found here

I need to find a way to calculate, for each Page Name, the totals and the percentage of Total Interactions, Likes, Shares, Comments, Love, and Angry that will result in a DataFrame with the following structure

candidato Page Name Total Interactions Total Interactions Percentage Total Likes Total Likes Percentage Other Columns Other Columns Percentage
António Costa Observador 6500 34 23 1% 540 23%
Rui Rio Observador 4500 23 value percentage value percentage

The reason why I need to calculate this is in order to produce a percent stacked bar chart such as this one: Percent Stacked Bar Chart

What is the best way to achieve this with Pandas? Thank you in advance for your help.

Disclosure This question is to help in a non-for-profit project that analyzes media behaviour, and bias, towards Portuguese candidates to the 2022 general elections. The prior report was made using Google Sheets but analyzing the datasets with Python is the best way, since I plan on doing this every 3 months.

The GitHub repo can be found here, where you can access all datasets and code used.



Solution 1:[1]

After getting the data via:

!wget https://raw.githubusercontent.com/JorgeMiguelGomes/LEG2022_MediaMonitor/main/legislativas_2022_media_monitor_29jan2022/data_products/legislativas_2022_all_candidates_filtered.csv

and reading in csv to dataframe:

df = pd.read_csv("legislativas_2022_all_candidates_filtered.csv")
df = df.drop(columns=["Unnamed: 0"])

You can groupby Page Name and sum the values in the columns you mentioned:

col_list = ["Total Interactions","Likes","Shares","Comments","Love","Angry"]
totals = df.groupby("Page Name")[col_list].sum()

The first 3 lines will look like:

              Total Interactions  Likes  Shares  Comments  Love  Angry
Page Name                                                             
AgĂȘncia Lusa                4048   1464     285      1011    79    127
CM TV                      21813   7934    1050      7824   401    627
CNN Portugal              216437  59776    8755    107879  3640   5198

To get percentages, you can do:

percentages = totals/df[col_list].sum()*100

Sample from result:

              Total Interactions      Likes  ...       Love      Angry
Page Name                                    ...                      
AgĂȘncia Lusa            0.324301   0.403282  ...   0.408945   0.371085
CM TV                   1.747521   2.185548  ...   2.075784   1.832048
CNN Portugal           17.339582  16.466265  ...  18.842530  15.188172

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 zabop