'remove duplicate values in a tuple array in python
I have a party whom purchases products. Every time the customer purchases a product, a new row is generated with the same party number.
I have grouped the products on party number and I am now stuck with a column which has arrays of tuples in it
| Party Nbr | Product |
|---|---|
| 1 | (a, a, a, a, b, c) |
| 2 | (a, d, a, a) |
| 3 | (a, a, b, b, b) |
I cant find how I can remove all duplicates from each row of the product column.
Code for the groupby:
pf = prod.groupby(['Party Nbr'])['Product name'].apply(tuple).reset_index().rename(columns= {'Product name': 'Product'})
pf['Product'] = tuple(set(pf['Product']))
ValueError: Length of values (4663) does not match length of index (32539)
Someone able to help me?
Solution 1:[1]
Assuming, you are using pandas, I recreated your table into a dataframe, and show how you could do the transform.
In [11]: df = pd.DataFrame({
"party": [1, 2, 3],
"product": [
("a", "a", "a", "a", "b", "c"),
("a", "d", "a", "a"),
("a", "a", "b", "b", "b")]})
In [12]: df
Out[12]:
party product
0 1 (a, a, a, a, b, c)
1 2 (a, d, a, a)
2 3 (a, a, b, b, b)
In [13]: df["product"] = df["product"].apply(set).apply(tuple)
In [14]: df
Out[17]:
party product
0 1 (c, b, a)
1 2 (a, d)
2 3 (b, a)
Note: as mentioned in the comments, the order of the products is not preserved, you want to preserve the order, you can use a custom function in place of chaining set & tuple.
Solution 2:[2]
To remove the duplicate from a tuple you can use the set type that will automatically remove duplicates. You can do it in a simple call :
In [1]: a=(1,2,2,1,1,1,1,3)
In [2]: tuple(set(a))
Out[2]: (1, 2, 3)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | suvayu |
| Solution 2 | Maxime Lavaud |
