'In Pandas, how can I create a pivot table that returns corresponding column value, for a max in another column?

sorry if this sounds like a silly question, I'm very new to python and pandas and I've been having trouble with a pivot table. Any help would be appreciated.

I have a file which looks like this:

Idx State Sex  Year    Name  Count
0    GA   F  1910    Mary    841
1    GA   F  1910   Annie    553
2    GA   F  1910  Mattie    320
3    GA   F  1910    Ruby    279
4    GA   F  1910  Willie    275
5    GA   F  1910  Louise    231

I wanted to create a pivot table, which returns a column of sequential years [one row for each], along with a column for the Name which has the highest count, and a Count column showing that maximum eg

Year    Name  Count
1910    Mary    1841
1911    Paul    553
1912   June     9620

I've tried:

import pandas as pd
bn = pd.read_csv("baby_names.csv")
bn['Count'].astype(int)
by_yr = pd.pivot_table(data=bn, values="Count", index=['Year','Name'], aggfunc='max' )
print(by_yr.head(10))

as well as:

import pandas as pd
bn = pd.read_csv("baby_names.csv")
bn['Count'].astype(int)
by_yr = pd.pivot_table(data=bn, values=['Name',"Count"], index='Year', aggfunc='max' )
print(by_yr.head(10))

but neither of them work. I'm sure I'm missing something stupid but any help is appreciated. Thanks



Solution 1:[1]

You can use groupby('Year') and nlargest to find, for each group/year, the row of the maximum Count.

step1 = df.groupby('Year').apply(lambda df: df.nlargest(1, 'Count'))

Then takes care of the formatting with

step1.droplevel(1, axis=0)[['Name', 'Count']].reset_index()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Raymond Kwok