'Remove duplicates from list but keep by color preference
I have a list that contains duplicates like this..
"id","name","type","price","color"
"23","item1","t-shirt","37","red"
"56","item66","jumper","3","yellow"
"366","item7","jumper","55","yellow"
"366","item7","jumper","55","red"
"745","item 9","t-shirt","45","green"
"3245","item 12","t-shirt","67","red"
"3245","item 12","t-shirt","67","purple"
"654","item 88","jumper","66","blue"
"2","item 99","jumper","77","purple"
"2","item 99","jumper","77","green"
I want to remove the duplicates but keep the one by order or color prefence by this table..
1 - Red
2 - Purple
3 - Blue
4 - Green
5 - Yellow
So the final list would look like this..
"id","name","type","price","color"
"23","item1","t-shirt","37","red"
"56","item66","jumper","3","yellow"
"366","item7","jumper","55","red"
"745","item 9","t-shirt","45","green"
"3245","item 12","t-shirt","67","red"
"654","item 88","jumper","66","blue"
"2","item 99","jumper","77","purple"
What is my best approach? Would sorting them by color prefernce first and then removing duplicates be a workable solution? If so, does anybody have an example of something similar being achieved? I am not sure how to sort by order of preferenc.
Solution 1:[1]
Sort the data according to the color preference and then groupby id, the first element and then get all the first elements of the group.
from itertools import groupby
data = [
["id", "name", "type", "price", "color"],
["23", "item1", "t-shirt", "37", "red"],
["56", "item66", "jumper", "3", "yellow"],
["366", "item7", "jumper", "55", "yellow"],
["366", "item7", "jumper", "55", "red"],
["745", "item 9", "t-shirt", "45", "green"],
["3245", "item 12", "t-shirt", "67", "red"],
["3245", "item 12", "t-shirt", "67", "purple"],
["654", "item 88", "jumper", "66", "blue"],
["2", "item 99", "jumper", "77", "purple"],
["2", "item 99", "jumper", "77", "green"]
]
header = data[0]
data = data[1:]
preference = {
"red": 1,
"purple": 2,
"blue": 3,
"green": 4,
"yellow": 5
}
def key_function(element):
color = element[-1]
return (element[0], preference[color])
data.sort(key=key_function)
print(header)
for group, grouping in groupby(data, key=lambda x: x[0]):
print(next(grouping))
OUTPUT
['id', 'name', 'type', 'price', 'color']
['2', 'item 99', 'jumper', '77', 'purple']
['23', 'item1', 't-shirt', '37', 'red']
['3245', 'item 12', 't-shirt', '67', 'red']
['366', 'item7', 'jumper', '55', 'red']
['56', 'item66', 'jumper', '3', 'yellow']
['654', 'item 88', 'jumper', '66', 'blue']
['745', 'item 9', 't-shirt', '45', 'green']
Solution 2:[2]
I am assuming the original is a list of lists (one list for each line)
convert it to a single list where each item is an id followed by a tuple ("name", "type", "price", "color")
list2=[]
for item in list1:
x= [item[0],tuple(item[1:])]
list2.append(x)
Now make a dictionary out of list 2. This will have no duplicate keys (id numbers) but there is no guarantee which line will be used
dict1=dict(list2)
If we just sort the dictionary the sorting will be done by id. So we use the key parameter of sorted. Key is a function, normally a lambda function is used here, but a regular named function will work. It should take one parameter, an item of the thing being sorted and return something that can be checked for order. We specify the order a list.
Order_list=["red","purple","blue","green","yellow"]
def sort_key(x):
check = x[1][3] #(a color)
key = Order_list.index(check)
return key
And now we can sort our dictionary
sorted(dict1, key = sort_key)
The return value will be a list of k:value pairs with the value of each pair a 4 tuple.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | William |
