'How do you Group, Sort, and Extract 2nd Highest Quantity from Pandas DataFrame?
I'm looking to group a pandas dataframe according to customer, sort it according to quantity, and return the entire row containing the 2nd highest value for quantity.
I have something like this:
customer item quantity
0 A p 5
1 A p 8
2 A q 9
3 A q 2
4 B p 3
5 B p 6
I'm able to return a dataframe containing the rows with the maximum quantity. I used:
idx=df.groupby(by='customer')['quantity'].idxmax()
df_max=df.loc[idx,]
This returned the following Dataframe:
customer item quantity
2 A q 9
5 B p 6
However, I also need a Dataframe with the second-highest quantity, and I'm stumped.
I'd like a Dataframe returned that looks like this:
customer item quantity
1 A p 8
4 B p 3
Thanks!
Solution 1:[1]
You can try pandas.core.groupby.GroupBy.nth to take the nth row from a sorted dataframe, note the index starts from 0.
out = (df.sort_values('quantity', ascending=False)
.groupby('customer', as_index=False).nth(1))
print(out)
customer item quantity
1 A p 8
4 B p 3
Solution 2:[2]
df.groupby("customer").agg({'quantity': list}).apply(lambda x: sorted(x.quantity)[-2], axis=1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ynjxsjmh |
| Solution 2 | MoRe |
