'Efficient way to get the N largest values of a column
I need to get the w highest values of a column groupying by Country.
The code below is working:
w = 100
df.groupby('country').apply(lambda x: x.sort_values('x', ascending=False).head(w)
Is there a way to make this code more efficient? My dataset is huge, like 30kk rows.
Solution 1:[1]
You can try pandas.core.groupby.SeriesGroupBy.nlargest
w = 100
df.groupby('country').nlargest(w)
According to the doc
Faster than
.sort_values(ascending=False).head(n)for small n relative to the size of the Series object.
Since your w=100 is small relative to 30kk, it will be faster.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ynjxsjmh |
