'Pandas dataframe get first row of each group
I have a pandas DataFrame like following:
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
'value' : ["first","second","second","first",
"second","first","third","fourth",
"fifth","second","fifth","first",
"first","second","third","fourth","fifth"]})
I want to group this by ["id","value"] and get the first row of each group:
id value
0 1 first
1 1 second
2 1 second
3 2 first
4 2 second
5 3 first
6 3 third
7 3 fourth
8 3 fifth
9 4 second
10 4 fifth
11 5 first
12 6 first
13 6 second
14 6 third
15 7 fourth
16 7 fifth
Expected outcome:
id value
1 first
2 first
3 first
4 second
5 first
6 first
7 fourth
I tried following, which only gives the first row of the DataFrame. Any help regarding this is appreciated.
In [25]: for index, row in df.iterrows():
....: df2 = pd.DataFrame(df.groupby(['id','value']).reset_index().ix[0])
Solution 1:[1]
This will give you the second row of each group (zero indexed, nth(0) is the same as first()):
df.groupby('id').nth(1)
Documentation: http://pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group
Solution 2:[2]
I'd suggest to use .nth(0) rather than .first() if you need to get the first row.
The difference between them is how they handle NaNs, so .nth(0) will return the first row of group no matter what are the values in this row, while .first() will eventually return the first not NaN value in each column.
E.g. if your dataset is :
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4],
'value' : ["first","second","third", np.NaN,
"second","first","second","third",
"fourth","first","second"]})
>>> df.groupby('id').nth(0)
value
id
1 first
2 NaN
3 first
4 first
And
>>> df.groupby('id').first()
value
id
1 first
2 second
3 first
4 first
Solution 3:[3]
If you only need the first row from each group we can do with drop_duplicates, Notice the function default method keep='first'.
df.drop_duplicates('id')
Out[1027]:
id value
0 1 first
3 2 first
5 3 first
9 4 second
11 5 first
12 6 first
15 7 fourth
Solution 4:[4]
maybe this is what you want
import pandas as pd
idx = pd.MultiIndex.from_product([['state1','state2'], ['county1','county2','county3','county4']])
df = pd.DataFrame({'pop': [12,15,65,42,78,67,55,31]}, index=idx)
pop state1 county1 12 county2 15 county3 65 county4 42 state2 county1 78 county2 67 county3 55 county4 31
df.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values('pop', ascending=False)).groupby(level=0).head(3)
> Out[29]:
pop
state1 county3 65
county4 42
county2 15
state2 county1 78
county2 67
county3 55
Solution 5:[5]
I suppose "first" means you have already sorted your DataFrame as you want.
What I do is :
df.groupby('id').agg('first') I suppose "first" means you have already sorted your DataFrame as you want. What I do is :
df.groupby('id').agg('first')
value
id
1 first
2 first
3 first
4 second
5 first
6 first
7 fourth
the nice thing is that you can plug any function you want :
df.groupby('id').agg(['first','last','count']))
value
first last count
id
1 first second 3
2 first second 2
3 first fifth 4
4 second fifth 2
5 first first 1
6 first third 3
7 fourth fifth 2
Output DataFrame has MultiIndex columns
MultiIndex([('value', 'first'),
('value', 'last'),
('value', 'count')],
)
Solution 6:[6]
Considering that the 'id' column is of numeric type, such as int32/int64, one might also use groupby.rank() as following
[In]: df[df.groupby('value')['id'].rank() == 1]
[Out]:
id value
0 1 first
6 3 third
7 3 fourth
8 3 fifth
If one wants to reset the index, just pass .reset_index() such as
[In]: df[df.groupby('value')['id'].rank() == 1].reset_index()
[Out]:
index id value
0 0 1 first
1 6 3 third
2 7 3 fourth
3 8 3 fifth
If the index and id columns are not needed
[In]: df.drop(['index', 'id'], axis=1, inplace=True)
[Out]:
value
0 first
1 third
2 fourth
3 fifth
Solution 7:[7]
You can use the method take that accepts a list of indices of elements to select:
df.groupby('id').take([0])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | wij |
| Solution 2 | |
| Solution 3 | BENY |
| Solution 4 | Siraj S. |
| Solution 5 | kidpixo |
| Solution 6 | |
| Solution 7 | Mykola Zotko |
