'Pandas drop duplicates where condition
I would like to df.drop_duplicates() based off a subset, but also ignore if a column has a specific value.
For example...
v1 v2 v3
ID
148 8751704.0 G dog
123 9082007.0 G dog
123 9082007.0 G dog
123 9082007.0 G cat
I would like to drop duplicate [ID, v1] but ignore if v3 is equal to cat so something like this:
full_df.drop_duplicates([ID, v1], inplace=True, conditional=exclude v3 = cat)
Hope that makes sense
Solution 1:[1]
Use boolean indexing with Series.duplicated and pd.Index.duplicated:
df[~(df['v1'].duplicated() & df.index.duplicated()) | df['v3'].eq('cat')]
Output
v1 v2 v3
ID
148 8751704.0 G dog
123 9082007.0 G dog
123 9082007.0 G cat
if IDis not the index:
df[~df[['ID', 'v1']].duplicated() | df['v3'].eq('cat')]
Solution 2:[2]
You could use chain another condition using a bitwise and, to ensure that cat is not cat:
df[~(df.reset_index().duplicated(['ID', 'v1']) & df.v3.ne('cat').values).values]
v1 v2 v3
148 8751704.0 G dog
123 9082007.0 G dog
123 9082007.0 G cat
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | yatu |
