'How to find duplicate rows based on given combination of columns and roll up observations in pandas data frame? [duplicate]

I have a data frame as below -

df_add = pd.DataFrame({
    'doc_id':[100,101,102,103],
    'last_name':['Mallesham','Mallesham','Samba','Bhavik'],
    'first_name':['Yamulla','Yamulla','Anil','Yamulla'],
    'dob':['06-03-1900','06-03-1900','20-09-2020','09-16-2020']
})

enter image description here

Here doc_id 100 and 101 are duplicated rows on considering last, first names and DOB's.

Here My requirement is to roll up 101 to 100 as follows -

enter image description here

doc_id should be filled up as 100;101 with semicolon separator.

In a second case:

If I have just consider last_name and first_name combination it should display as below since a Same Name persons might have different DOB's

enter image description here



Solution 1:[1]

You need to change doc_id to str , to use str.cat function

df_add["doc_id"] = df_add["doc_id"].astype('str)
resultant_df = df_add.groupby(["first_name",
           "last_name","dob"])[['doc_id']].apply(lambda x : x.str.cat(sep=','))

print(resultant_df.reset_index())

     first_name  last_name  dob         0
0    Anil        Samba      20-09-2020  102
1    Yamulla     Bhavik     09-16-2020  103
2    Yamulla     Mallesham  06-03-1900  100,101

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 qaiser