'Selecting first row from each subgroup (pandas)
How to select the subset of rows where distance is lowest, grouping by date and p columns?
df
v p distance date
0 14.6 sst 22454.1 2021-12-30
1 14.9 sst 24454.1 2021-12-30
2 14.8 sst 33687.4 2021-12-30
3 1.67 wvht 23141.8 2021-12-30
4 1.9 wvht 24454.1 2021-12-30
5 1.8 wvht 24454.1 2021-12-30
6 1.7 wvht 23141.4 2021-12-31
7 2.1 wvht 24454.1 2021-12-31
Ideally, the returned dataframe should contain:
df
v p distance date
0 14.6 sst 22454.1 2021-12-30
3 1.67 wvht 23141.8 2021-12-30
6 1.7 wvht 23141.4 2021-12-31
Solution 1:[1]
One way is to use groupby + idxmin to get the index of the smallest distance per group, then use loc to get the desired output:
out = df.loc[df.groupby(['date', 'p'])['distance'].idxmin()]
Output:
v p distance date
0 14.60 sst 22454.1 2021-12-30
3 1.67 wvht 23141.8 2021-12-30
6 1.70 wvht 23141.4 2021-12-31
Solution 2:[2]
sort values by p and distance. Drop any duplicates keeping first occurance in each p and date
df.sort_values(by=['p', 'distance']).drop_duplicates(subset=['p','date'],keep='first')
v p distance date
0 14.60 sst 22454.1 2021-12-30
6 1.70 wvht 23141.4 2021-12-31
3 1.67 wvht 23141.8 2021-12-30
Solution 3:[3]
If you don't need original indexes then you can use .first() or .min('distance') and later reset_index().
df.groupby(['date', 'p']).first().reset_index()
import pandas as pd
text = '''v p distance date
0 14.6 sst 22454.1 2021-12-30
1 14.9 sst 24454.1 2021-12-30
2 14.8 sst 33687.4 2021-12-30
3 1.67 wvht 23141.8 2021-12-30
4 1.9 wvht 24454.1 2021-12-30
5 1.8 wvht 24454.1 2021-12-30
6 1.7 wvht 23141.4 2021-12-31
7 2.1 wvht 24454.1 2021-12-31'''
import io
df = pd.read_csv(io.StringIO(text), sep='\s+')
df.groupby(['date', 'p']).first().reset_index()
Result:
date p v distance
0 2021-12-30 sst 14.60 22454.1
1 2021-12-30 wvht 1.67 23141.8
2 2021-12-31 wvht 1.70 23141.4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | wwnde |
| Solution 3 |
