'How do I combine two datasets in pandas and keep unique rows only?

I have a dataset with a product list which, everytime a customer does a purchase, adds a row with new information. As a result I have rows with the same customer number occuring multiple times as they purchase multiple products. I have managed to create a new df which is grouped per customer number, therefor decreasing the dataset size from 16.000 to around 3.000. Now When I want to combine the two, I want to keep the grouped by party numbers, to keep the data organized. But for some reason it keeps on getting back to the 16.000.

my code is as follows:

#Create pandas dataframe with the usefull variables
prod = pd.DataFrame(pt[['Party Nbr', 'Product Nm', 'Category Desc','Group Desc']])

#Add column with total amount of products
prod['Product Cnt'] = prod.groupby('Party Nbr')['Party Nbr'].transform('count')

And this is a sample of rows from the result:

Party Nbr Product Nm Category Desc Group Desc Product Cnt
79695728.0 Betaalpas Betaaldiensten Pas 14
79741169.0 ING Business Card Betaaldiensten Creditcard 21
79907032.0 Mijn ING.nl Betaaldiensten Beheerfaciliteit 4
80139442.0 Zakelijke Oranje Spaarrekening Sparen Giraal sparen 7
80193730.0 PIN Pakket Betaaldiensten Betaalfaciliteit 5

with 16.000 rows

Then I grouped on party number to get a grouped categories column like this

pf = prod.groupby(['Party Nbr'])['Category Desc'].apply(list).reset_index().rename(columns= 
{'Category Desc': 'Categories'})
pf['Categories'] = pf['Categories'].apply(set).apply(tuple)

Giving me this with 3.000 rows

Party Nbr Categories
79687857.0 (Betaaldiensten, Sparen)
79687954.0 (nan, Betaaldiensten, Sparen)
79688233.0 (Betaaldiensten,)
79688438.0 (Betaaldiensten, Sparen)
79688845.0 (Betaaldiensten, Sparen)

How can I combine the two and keep the party number selection from the second table?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source