'Pandas drop row based on groupby AND partial string match
I have a large pandas DataFrame with numerous columns. I want to group by serial number AND where there are duplicates to keep the row where the product ID ends in -RF. The first part I can achieve with a groupby(subset='Serial Number'), however I'm at a loss as to how combine this and keep/drop row based on a regex ('-RF$'). How can I achieve this?
Input:
| Serial Number | Product ID |
|---|---|
| ABC1745AABC | ABC-SUP2E-RF |
| ABC1745AABC | ABC-SUP2E |
| ABC1745AAFF | ABC-SUP2E |
| ABC1745AAFE | ABC-SUP2E |
| ABC1745AAB1 | ABC-SUP2E-WS |
| ABC1745AAB1 | ABC-SUP2E |
Ultimately, I want to be left with something like this (output):
| Serial Number | Product ID |
|---|---|
| ABC1745AABC | ABC-SUP2E-RF |
| ABC1745AAFF | ABC-SUP2E |
| ABC1745AAFE | ABC-SUP2E |
| ABC1745AAB1 | ABC-SUP2E-WS |
| ABC1745AAB1 | ABC-SUP2E |
Data:
{'Serial Number': ['ABC1745AABC', 'ABC1745AABC', 'ABC1745AAFF', 'ABC1745AAFE'],
'Product ID': ['ABC-SUP2E-RF', 'ABC-SUP2E', 'ABC-SUP2E', 'ABC-SUP2E']}
Solution 1:[1]
You could add a column to mark rows ending with "RF", then sort values to leave those rows at the top of each group. And finally just group and take the first row:
df["RF"] = df["Product ID"].str.endswith("-RF")
df = df.sort_values(["Serial Number", "RF"], ascending=False)
output = df.groupby("Serial Number").first()[["Serial Number", "Product ID"]]
Output:
Serial Number Product ID
2 ABC1745AAFF ABC-SUP2E
3 ABC1745AAFE ABC-SUP2E
0 ABC1745AABC ABC-SUP2E-RF
Solution 2:[2]
Thanks for you help. I've solved it like this:
df = df.sort_values(["Serial Number", "Product ID"], ascending=(True, True))
df = df.drop_duplicates(subset=['Serial Number', 'Product Group'], keep='first')
Fortunately, the product ID I kept was the longest string in all cases. It would be good to find a solution for scenarios where this was not always the case.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | aaossa |
| Solution 2 | user3709511 |
