'Comparing every row with all other rows with pandas
my goal is to compare every row with all other rows to see how many rows are unique regarding their entries. I am quite new to pandas so I am at a loss. An exemplary dataframe would be as follows:
df = pd.DataFrame({"ID" : [1, 2, 3],
"age": [46, 48, 55],
"gender": ['female', 'female', 'male']},
index = [0, 1, 2])
Solution 1:[1]
What do you want to obtain exactly?
If you want to know per column how many unique values you have, use nunique:
df.nunique()
ID 3
age 3
gender 2
dtype: int64
If you want to know how many unique rows (considering combinations of columns), use duplicated:
len(df) - df[['age', 'gender']].duplicated().sum()
# or
(~df.drop(columns='ID').duplicated()).sum()
# or
(~df[['age', 'gender']].duplicated()).sum()
3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
