'Drop duplicate IDs keeping if value = certain value , otherwise keep first duplicate
>>> df = pd.DataFrame({'id': ['1', '1', '2', '2', '3', '4', '4', '5', '5'],
... 'value': ['keep', 'y', 'x', 'keep', 'x', 'Keep', 'x', 'y', 'x']})
>>> print(df)
id value
0 1 keep
1 1 y
2 2 x
3 2 keep
4 3 x
5 4 Keep
6 4 x
7 5 y
8 5 x
In this example, the idea would be to keep index values 0, 3, 4, 5 since they are asscoiated with a duplicate id with a particular value == 'Keep' and 7 (since it is the first of the duplicates for id 5).
Solution 1:[1]
In your case try with idxmax
out = df.loc[df['value'].eq('keep').groupby(df.id).idxmax()]
Out[24]:
id value
0 1 keep
3 2 keep
4 3 x
5 4 Keep
7 5 y
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BENY |
