'droping duplicates in pandas based on several values be the same [duplicate]
Hi I want to drop all rows of a pandas dataframe when a subset of columns has the same values.
It could be done with e.g., itterrow() going through each line and getting the index of each row which fullfills it.
However, is there a more pythonic/pandas way?
Example:
we have a dataframe with columns "name", "age", "school", "grade", "sex"
Now I want to remove all duplicates where, e.g., "age","school" and "grade" is the same for the specific column. Keeping e.g., the first.
so:
Tom, 17, WestHigh, 5.0, M
Ray, 14, NorthLow, 2.1, F
Ane, 17, WestHigh, 5.0, F
should result in
Tom, 17, WestHigh, 5.0, M
Ray, 14, NorthLow, 2.1, F
Thanks a lot
Solution 1:[1]
I would suggest:
df=df.drop_duplicates(subset=["age", "school", "grade"], keep="first")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Florida Man |
