'Dedup variable keeping max or min in specific variable on dataframe python [closed]
for example, I want to dedup an ID and it keep the maximum or minimum depending on variable that I want to specify. Can I do that using some function in pandas? Data is as dataframe. The drop_duplicate() don't help because It doesn't keep the value that I want, just by order.
import pandas as pd
df = pd.DataFrame({
'ID': ['245', '144', '245', '245', '144'],
'Acesso': [3, 1, 1, 5, 2],
'Number': [4, 4, 2, 2, 5]
})
I want an output like a picture, keeping the minimum in "Acesso" and maximum in "Number", deduplicated "ID".
Solution 1:[1]
You can start by separately taking the min of Acesso grouped by ID and the max of Number grouped by ID.
You then just have to concatenate these into a single DataFrame. The code would look like this:
df_acesso_min = df.groupby("ID")["Acesso"].min()
df_number_max = df.groupby("ID")["Number"].max()
df = pd.concat([df_acesso_min, df_number_max], axis=1)
print(df)
# Acesso Number
# ID
# 144 1 5
# 245 1 4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rayan Hatout |


