'Pandas drop duplicates across columns

Good day,

I need a way to check each row of a dataframe and drop the row if all the values in that row (across the score columns) are the same. The person_id may differ.

Here is a part of the dataset:

In:

data = [[7, 10, 10, 10, 10], [17, 10, 10, 10, 10], [18, 8, 10, 10, 10], [20, 10, 10, 9, 9], [25, 9, 8, 8, 7]]
df = pd.DataFrame(data, columns = ['person_id', 'score_1', 'score_2', 'score_3', 'score_4'])
df

Out:

    person_id   score_1 score_2 score_3 score_4
0   7           10      10      10      10
1   17          10      10      10      10
2   18          8       10      10      10
3   20          10      10      9       9
4   25          9       8       8       7

The desired output would be:

    person_id   score_1 score_2 score_3 score_4
2   18          8       10      10      10
3   20          10      10      9       9
4   25          9       8       8       7

Since row 0 (person_id 7) and row 1 (person_id 17) have the same scores. The number of columns will also change, adding more score columns - thus, I cannot use

df_no_duplicates = df.loc[(df.score_1 != df.score_2) | (df.score_2 != df.score_3)| (df.score_3 != df.score_4)]

python pandas

Solution 1:^[1]

You can try nunique

out = df[df.filter(like='score').nunique(1)>1].copy()
Out[208]: 
   person_id  score_1  score_2  score_3  score_4
2         18        8       10       10       10
3         20       10       10        9        9
4         25        9        8        8        7

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	BENY

'Pandas drop duplicates across columns

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]