'Pandas drop duplicates where condition

I would like to df.drop_duplicates() based off a subset, but also ignore if a column has a specific value.

For example...

                 v1      v2     v3      
ID                                                          
148         8751704.0    G      dog   
123         9082007.0    G      dog  
123         9082007.0    G      dog 
123         9082007.0    G      cat   

I would like to drop duplicate [ID, v1] but ignore if v3 is equal to cat so something like this:

full_df.drop_duplicates([ID, v1], inplace=True, conditional=exclude v3 = cat)

Hope that makes sense



Solution 1:[1]

Use boolean indexing with Series.duplicated and pd.Index.duplicated:

df[~(df['v1'].duplicated() & df.index.duplicated()) | df['v3'].eq('cat')]

Output

            v1 v2   v3
ID                    
148  8751704.0  G  dog
123  9082007.0  G  dog
123  9082007.0  G  cat

if IDis not the index:

df[~df[['ID', 'v1']].duplicated() | df['v3'].eq('cat')]

Solution 2:[2]

You could use chain another condition using a bitwise and, to ensure that cat is not cat:

df[~(df.reset_index().duplicated(['ID', 'v1']) & df.v3.ne('cat').values).values]

        v1     v2   v3
148  8751704.0  G  dog
123  9082007.0  G  dog
123  9082007.0  G  cat

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 yatu