'Pandas Python - How to create new columns with MultiIndex from pivot table

I have created a pivot table with 2 different types of values i) Number of apples from 2017-2020, ii) Number of people from 2017-2020. I want to create additional columns to calculate iii) Apples per person from 2017-2020. How can I do so?

Current code for pivot table:

tdf = df.pivot_table(index="States",
                     columns="Year",
                     values=["Number of Apples","Number of People"],
                     aggfunc= lambda x: len(x.unique()),
                     margins=True)
tdf 

Here is my current pivot table:

                Number of Apples                  Number of People
                2017    2018    2019    2020      2017    2018    2019    2020   
California        10      18      20      25         2       3       4       5
West Virginia      8      35      25      12         2       5       5       4
...

I want my pivot table to look like this, where I add additional columns to divide Number of Apples by Number of People.

                Number of Apples                  Number of People                  Number of Apples per Person
                2017    2018    2019    2020      2017    2018    2019    2020      2017    2018    2019    2020   
California        10      18      20      25         2       3       4       5       5       6       5       5      
West Virginia      8      35      25      12         2       5       5       4       4       7       5       3

I've tried a few things, such as:

  • Creating a new column via assigning new column names, but does not work with multiple column index tdf["Number of Apples per Person"][2017] = tdf["Number of Apples"][2017] / tdf["Number of People"][2017]
  • Tried the other assignment method tdf.assign(tdf["Number of Apples per Person"][2017] = tdf["Enrollment ID"][2017] / tdf["Student ID"][2017]); got this error SyntaxError: expression cannot contain assignment, perhaps you meant "=="?

Appreciate any help! Thanks



Solution 1:[1]

Given

df
              Number of Apples                Number of People               
                          2017 2018 2019 2020             2017 2018 2019 2020
California                  10   18   20   25                2    3    4    5
West Virginia                8   35   25   12                2    5    5    4

You can index on the first level to get sub-frames and then divide. The division will be auto-aligned on the columns.

df['Number of Apples'] / df['Number of People']
               2017  2018  2019  2020
California      5.0   6.0   5.0   5.0
West Virginia   4.0   7.0   5.0   3.0

Append this back to your DataFrame:

pd.concat([df, pd.concat([df['Number of Apples'] / df['Number of People']], keys=['Result'], axis=1)], axis=1)
              Number of Apples                Number of People                Result               
                          2017 2018 2019 2020             2017 2018 2019 2020   2017 2018 2019 2020
California                  10   18   20   25                2    3    4    5    5.0  6.0  5.0  5.0
West Virginia                8   35   25   12                2    5    5    4    4.0  7.0  5.0  3.0

This is fast since it is completely vectorized.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1