'In Pandas, how can I create a pivot table that returns corresponding column value, for a max in another column?
sorry if this sounds like a silly question, I'm very new to python and pandas and I've been having trouble with a pivot table. Any help would be appreciated.
I have a file which looks like this:
Idx State Sex Year Name Count
0 GA F 1910 Mary 841
1 GA F 1910 Annie 553
2 GA F 1910 Mattie 320
3 GA F 1910 Ruby 279
4 GA F 1910 Willie 275
5 GA F 1910 Louise 231
I wanted to create a pivot table, which returns a column of sequential years [one row for each], along with a column for the Name which has the highest count, and a Count column showing that maximum eg
Year Name Count
1910 Mary 1841
1911 Paul 553
1912 June 9620
I've tried:
import pandas as pd
bn = pd.read_csv("baby_names.csv")
bn['Count'].astype(int)
by_yr = pd.pivot_table(data=bn, values="Count", index=['Year','Name'], aggfunc='max' )
print(by_yr.head(10))
as well as:
import pandas as pd
bn = pd.read_csv("baby_names.csv")
bn['Count'].astype(int)
by_yr = pd.pivot_table(data=bn, values=['Name',"Count"], index='Year', aggfunc='max' )
print(by_yr.head(10))
but neither of them work. I'm sure I'm missing something stupid but any help is appreciated. Thanks
Solution 1:[1]
You can use groupby('Year') and nlargest to find, for each group/year, the row of the maximum Count.
step1 = df.groupby('Year').apply(lambda df: df.nlargest(1, 'Count'))
Then takes care of the formatting with
step1.droplevel(1, axis=0)[['Name', 'Count']].reset_index()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Raymond Kwok |
