'Pandas deleting partly duplicate rows with wrong values in specific columns

I have a large dataframe from a csv file which has a few dozen columns. I have another csv file which I concatenated to the original. Now, the second file has exactly the same structure but a particular column may have incorrect values. I want to delete rows which are duplicates that have this one wrong column. For example in the below the last row should be removed. (The names of the specimens (Albert, etc.) are unique). I have been struggling to find a way of deleting only the data which has the wrong value, without risking deleting the correct row.

0      Albert       alive
1      Newton       alive
2      Galileo      alive
3      Copernicus   dead
4      Galileo      dead
...

Any help would be greatly appreciated!

Solution 1:^[1]

You could use this to determine if a name is mentioned more than 1 time

df['RN'] = df.groupby(['Name']).cumcount() + 1

You can also expand it out to have more columns in the "groupby" to see if there are any more limitations you want to put on the duplicates

df['RN'] = df.groupby(['Name', 'Another Column']).cumcount() + 1

The advantage I like with this is it gives you more control over the RN selection if you needed to df.loc[df['RN'] > 2].

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	ArchAngelPwn

'Pandas deleting partly duplicate rows with wrong values in specific columns

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]