'groupby and max in pandas

I have the following DataFrame:

import pandas as pd

data = {'id': ['A', 'B', 'C', 'D', 'E'],
        'c1': [4,7,9,0,3],
        'c2': [1,8,3,0,2]}

data = pd.DataFrame(data)

I want to groupby('id') and take the maximum value among c1 and c2.

The desired output:

data = {'id': ['A', 'B', 'C', 'D','E'],
        'c': [4,8,9,0,3]}

data = pd.DataFrame(data)

Solution 1:^[1]

First create index by id, get max per rows and then aggregate max if possible id are duplicated values:

df = data.set_index('id').max(axis=1).groupby(level=0).max().reset_index(name='c')
print (df)
  id  c
0  A  4
1  B  8
2  C  9
3  D  0
4  E  3

If id are not duplicated like in sample data remove aggregation:

df = data.set_index('id').max(axis=1).reset_index(name='c')
print (df)
  id  c
0  A  4
1  B  8
2  C  9
3  D  0
4  E  3

Solution 2:^[2]

Use melt and GroupBy.max:

(data.melt(id_vars='id', value_name='c')
     .groupby('id', as_index=False)['c'].max()
)

output:

Solution 3:^[3]

There are many ways to do so and you don't necessarily need a groupby, but just a new column based on the values of "c1" and "c2".

I like to use np.where in those cases:

import pandas as pd
import numpy as np

data = {'id': ['A', 'B', 'C', 'D', 'E'],
        'c1': [4,7,9,0,3],
        'c2': [1,8,3,0,2]}

data = pd.DataFrame(data)

data ["c"] = np.where (data["c1"] >= data["c2"], data ["c1"], data["c2"])
del data ["c1"]
del data ["c2"]

The first argument is the condition to evaluate, the second is the return value in case the condition is matched, the third is the "else".

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	jezrael
Solution 2	mozway
Solution 3	Liutprand

'groupby and max in pandas

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]