'Pandas: remove duplicates based on substring [duplicate]

I have the following 2 columns, from a Pandas DataFrame:

antecedents        consequents
  apple               orange
  orange              apple

  apple               water
  apple               pineapple

  water               lemon
  lemon               water

I would like to remove duplicates that appear as bot antecedents and consequents, keeping only the first appearing, and thus obtain:

antecedents        consequents
  apple               orange

  apple               water
  apple               pineapple

  water               lemon

How can I achieve that using Pandas?



Solution 1:[1]

Use frozenset by both columns and test duplicates by Series.duplicated:

df2 = df[~df[['antecedents','consequents']].apply(frozenset,axis=1).duplicated()]

Or sorting values per rows in numpy.sort:

df1 = pd.DataFrame(np.sort(df[['antecedents','consequents']], axis=1), index=df.index)
df2 = df[~df1.duplicated()]

print (df2)
  antecedents consequents
0       apple      orange
2       apple       water
3       apple   pineapple
4       water       lemon

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1