'Remove duplicates from list but keep by color preference

I have a list that contains duplicates like this..

"id","name","type","price","color"
"23","item1","t-shirt","37","red"
"56","item66","jumper","3","yellow"
"366","item7","jumper","55","yellow"
"366","item7","jumper","55","red"
"745","item 9","t-shirt","45","green"
"3245","item 12","t-shirt","67","red"
"3245","item 12","t-shirt","67","purple"
"654","item 88","jumper","66","blue"
"2","item 99","jumper","77","purple"
"2","item 99","jumper","77","green"

I want to remove the duplicates but keep the one by order or color prefence by this table..

1 - Red
2 - Purple
3 - Blue
4 - Green
5 - Yellow

So the final list would look like this..

"id","name","type","price","color"
"23","item1","t-shirt","37","red"
"56","item66","jumper","3","yellow"
"366","item7","jumper","55","red"
"745","item 9","t-shirt","45","green"
"3245","item 12","t-shirt","67","red"
"654","item 88","jumper","66","blue"
"2","item 99","jumper","77","purple"

What is my best approach? Would sorting them by color prefernce first and then removing duplicates be a workable solution? If so, does anybody have an example of something similar being achieved? I am not sure how to sort by order of preferenc.



Solution 1:[1]

Sort the data according to the color preference and then groupby id, the first element and then get all the first elements of the group.

from itertools import groupby

data = [
    ["id", "name", "type", "price", "color"],
    ["23", "item1", "t-shirt", "37", "red"],
    ["56", "item66", "jumper", "3", "yellow"],
    ["366", "item7", "jumper", "55", "yellow"],
    ["366", "item7", "jumper", "55", "red"],
    ["745", "item 9", "t-shirt", "45", "green"],
    ["3245", "item 12", "t-shirt", "67", "red"],
    ["3245", "item 12", "t-shirt", "67", "purple"],
    ["654", "item 88", "jumper", "66", "blue"],
    ["2", "item 99", "jumper", "77", "purple"],
    ["2", "item 99", "jumper", "77", "green"]
]
header = data[0]
data = data[1:]
preference = {
    "red": 1,
    "purple": 2,
    "blue": 3,
    "green": 4,
    "yellow": 5
}


def key_function(element):
    color = element[-1]
    return (element[0], preference[color])


data.sort(key=key_function)
print(header)
for group, grouping in groupby(data, key=lambda x: x[0]):
    print(next(grouping))

OUTPUT

['id', 'name', 'type', 'price', 'color']
['2', 'item 99', 'jumper', '77', 'purple']
['23', 'item1', 't-shirt', '37', 'red']
['3245', 'item 12', 't-shirt', '67', 'red']
['366', 'item7', 'jumper', '55', 'red']
['56', 'item66', 'jumper', '3', 'yellow']
['654', 'item 88', 'jumper', '66', 'blue']
['745', 'item 9', 't-shirt', '45', 'green']

Solution 2:[2]

I am assuming the original is a list of lists (one list for each line)

convert it to a single list where each item is an id followed by a tuple ("name", "type", "price", "color")

    list2=[]
    for item in list1:
      x= [item[0],tuple(item[1:])]
      list2.append(x)

Now make a dictionary out of list 2. This will have no duplicate keys (id numbers) but there is no guarantee which line will be used

    dict1=dict(list2)

If we just sort the dictionary the sorting will be done by id. So we use the key parameter of sorted. Key is a function, normally a lambda function is used here, but a regular named function will work. It should take one parameter, an item of the thing being sorted and return something that can be checked for order. We specify the order a list.

    Order_list=["red","purple","blue","green","yellow"]



    def sort_key(x):
        check = x[1][3]   #(a color)
        key = Order_list.index(check)
        return key  

And now we can sort our dictionary

    sorted(dict1, key = sort_key)

The return value will be a list of k:value pairs with the value of each pair a 4 tuple.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 William