'How do I combine two datasets in pandas and keep unique rows only?
I have a dataset with a product list which, everytime a customer does a purchase, adds a row with new information. As a result I have rows with the same customer number occuring multiple times as they purchase multiple products. I have managed to create a new df which is grouped per customer number, therefor decreasing the dataset size from 16.000 to around 3.000. Now When I want to combine the two, I want to keep the grouped by party numbers, to keep the data organized. But for some reason it keeps on getting back to the 16.000.
my code is as follows:
#Create pandas dataframe with the usefull variables
prod = pd.DataFrame(pt[['Party Nbr', 'Product Nm', 'Category Desc','Group Desc']])
#Add column with total amount of products
prod['Product Cnt'] = prod.groupby('Party Nbr')['Party Nbr'].transform('count')
And this is a sample of rows from the result:
| Party Nbr | Product Nm | Category Desc | Group Desc | Product Cnt |
|---|---|---|---|---|
| 79695728.0 | Betaalpas | Betaaldiensten | Pas | 14 |
| 79741169.0 | ING Business Card | Betaaldiensten | Creditcard | 21 |
| 79907032.0 | Mijn ING.nl | Betaaldiensten | Beheerfaciliteit | 4 |
| 80139442.0 | Zakelijke Oranje Spaarrekening | Sparen | Giraal sparen | 7 |
| 80193730.0 | PIN Pakket | Betaaldiensten | Betaalfaciliteit | 5 |
with 16.000 rows
Then I grouped on party number to get a grouped categories column like this
pf = prod.groupby(['Party Nbr'])['Category Desc'].apply(list).reset_index().rename(columns=
{'Category Desc': 'Categories'})
pf['Categories'] = pf['Categories'].apply(set).apply(tuple)
Giving me this with 3.000 rows
| Party Nbr | Categories |
|---|---|
| 79687857.0 | (Betaaldiensten, Sparen) |
| 79687954.0 | (nan, Betaaldiensten, Sparen) |
| 79688233.0 | (Betaaldiensten,) |
| 79688438.0 | (Betaaldiensten, Sparen) |
| 79688845.0 | (Betaaldiensten, Sparen) |
How can I combine the two and keep the party number selection from the second table?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
