'How to find duplicate rows based on given combination of columns and roll up observations in pandas data frame? [duplicate]
I have a data frame as below -
df_add = pd.DataFrame({
'doc_id':[100,101,102,103],
'last_name':['Mallesham','Mallesham','Samba','Bhavik'],
'first_name':['Yamulla','Yamulla','Anil','Yamulla'],
'dob':['06-03-1900','06-03-1900','20-09-2020','09-16-2020']
})
Here doc_id 100 and 101 are duplicated rows on considering last, first names and DOB's.
Here My requirement is to roll up 101 to 100 as follows -
doc_id should be filled up as 100;101 with semicolon separator.
In a second case:
If I have just consider last_name and first_name combination it should display as below since a Same Name persons might have different DOB's
Solution 1:[1]
You need to change doc_id to str , to use str.cat function
df_add["doc_id"] = df_add["doc_id"].astype('str)
resultant_df = df_add.groupby(["first_name",
"last_name","dob"])[['doc_id']].apply(lambda x : x.str.cat(sep=','))
print(resultant_df.reset_index())
first_name last_name dob 0
0 Anil Samba 20-09-2020 102
1 Yamulla Bhavik 09-16-2020 103
2 Yamulla Mallesham 06-03-1900 100,101
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | qaiser |