'One-liner to remove duplicates, keep ordering of list [duplicate]

I have the following list:

['Herb', 'Alec', 'Herb', 'Don']

I want to remove duplicates while keeping the order, so it would be :

['Herb', 'Alec', 'Don']

Here is how I would do this verbosely:

l_new = []
for item in l_old:
    if item not in l_new: l_new.append(item)

Is there a way to do this in a single line?



Solution 1:[1]

You could use a set to remove duplicates and then restore ordering. And it's just as slow as your original, yaeh :-)

>>> sorted(set(l_old), key=l_old.index)
['Herb', 'Alec', 'Don']

Solution 2:[2]

Using pandas, create a series from the list, drop duplicates, and then convert it back to a list.

import pandas as pd

>>> pd.Series(['Herb', 'Alec', 'Herb', 'Don']).drop_duplicates().tolist()
['Herb', 'Alec', 'Don']

Timings

Solution from @StefanPochmann is the clear winner for lists with high duplication.

my_list = ['Herb', 'Alec', 'Don'] * 10000

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.11 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 16.1 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1000 loops, best of 3: 396 µs per loop

For larger lists with no duplication (e.g. simply a range of numbers), the pandas solution is very fast.

my_list = range(10000)

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.16 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 10.8 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1 loop, best of 3: 716 ms per loop

Solution 3:[3]

If you really don't care about optimizations and stuff you can use the following:

s = ['Herb', 'Alec', 'Herb', 'Don']
[x[0] for x in zip(s, range(len(s))) if x[0] not in s[:x[1]]]

Note that in my opinion you really should use the for loop in your question or the answer by @juanpa.arrivillaga

Solution 4:[4]

You can try this:

l = ['Herb', 'Alec', 'Herb', 'Don']
data = [i[-1] for i in sorted([({a:i for i, a in enumerate(l)}[a], a) for a in set({a:i for i, a in enumerate(l)}.keys())], key = lambda x: x[0])]

Output:

['Alec', 'Herb', 'Don']

This algorithm merely removes the first instance of a duplicate value.

Solution 5:[5]

l_new = []
for item in l_old:
    if item not in l_new: l_new.append(item)

In one line..ish:

l_new = []

[ l_new.append(item)  for item in l_old if item not in l_new]

Which has the behavior:

> a = [1,1,2,2,3,3,4,5,5]
> b = []
> [ b.append(item) for item in a if item not in b]
> print(b)
[1,2,3,4,5]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Stefan Pochmann
Solution 2 Olivier
Solution 3 Dekel
Solution 4 Ajax1234
Solution 5