'Pandas Groupby and Sum Only One Column
So I have a dataframe, df1, that looks like the following:
A B C
1 foo 12 California
2 foo 22 California
3 bar 8 Rhode Island
4 bar 32 Rhode Island
5 baz 15 Ohio
6 baz 26 Ohio
I want to group by column A and then sum column B while keeping the value in column C. Something like this:
A B C
1 foo 34 California
2 bar 40 Rhode Island
3 baz 41 Ohio
The issue is, when I say
df.groupby('A').sum()
column C gets removed, returning
B
A
bar 40
baz 41
foo 34
How can I get around this and keep column C when I group and sum?
Solution 1:[1]
If you don't care what's in your column C and just want the nth value, you could just do this:
df.groupby('A').agg({'B' : 'sum',
'C' : lambda x: x.iloc[n]})
Solution 2:[2]
Another option is to use groupby.agg and use the first method on column "C".
out = df.groupby('A', as_index=False, sort=False).agg({'B':'sum', 'C':'first'})
Output:
A B C
0 foo 34 California
1 bar 40 Rhode Island
2 baz 41 Ohio
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ah bon |
| Solution 2 |
