'Pandas drop duplicates across columns
Good day,
I need a way to check each row of a dataframe and drop the row if all the values in that row (across the score columns) are the same. The person_id may differ.
Here is a part of the dataset:
In:
data = [[7, 10, 10, 10, 10], [17, 10, 10, 10, 10], [18, 8, 10, 10, 10], [20, 10, 10, 9, 9], [25, 9, 8, 8, 7]]
df = pd.DataFrame(data, columns = ['person_id', 'score_1', 'score_2', 'score_3', 'score_4'])
df
Out:
person_id score_1 score_2 score_3 score_4
0 7 10 10 10 10
1 17 10 10 10 10
2 18 8 10 10 10
3 20 10 10 9 9
4 25 9 8 8 7
The desired output would be:
person_id score_1 score_2 score_3 score_4
2 18 8 10 10 10
3 20 10 10 9 9
4 25 9 8 8 7
Since row 0 (person_id 7) and row 1 (person_id 17) have the same scores.
The number of columns will also change, adding more score columns - thus, I cannot use
df_no_duplicates = df.loc[(df.score_1 != df.score_2) | (df.score_2 != df.score_3)| (df.score_3 != df.score_4)]
Solution 1:[1]
You can try nunique
out = df[df.filter(like='score').nunique(1)>1].copy()
Out[208]:
person_id score_1 score_2 score_3 score_4
2 18 8 10 10 10
3 20 10 10 9 9
4 25 9 8 8 7
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BENY |
