'One-liner to remove duplicates, keep ordering of list [duplicate]
I have the following list:
['Herb', 'Alec', 'Herb', 'Don']
I want to remove duplicates while keeping the order, so it would be :
['Herb', 'Alec', 'Don']
Here is how I would do this verbosely:
l_new = []
for item in l_old:
if item not in l_new: l_new.append(item)
Is there a way to do this in a single line?
Solution 1:[1]
You could use a set to remove duplicates and then restore ordering. And it's just as slow as your original, yaeh :-)
>>> sorted(set(l_old), key=l_old.index)
['Herb', 'Alec', 'Don']
Solution 2:[2]
Using pandas, create a series from the list, drop duplicates, and then convert it back to a list.
import pandas as pd
>>> pd.Series(['Herb', 'Alec', 'Herb', 'Don']).drop_duplicates().tolist()
['Herb', 'Alec', 'Don']
Timings
Solution from @StefanPochmann is the clear winner for lists with high duplication.
my_list = ['Herb', 'Alec', 'Don'] * 10000
%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.11 ms per loop
%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 16.1 ms per loop
%timeit sorted(set(my_list), key=my_list.index)
# 1000 loops, best of 3: 396 µs per loop
For larger lists with no duplication (e.g. simply a range of numbers), the pandas solution is very fast.
my_list = range(10000)
%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.16 ms per loop
%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 10.8 ms per loop
%timeit sorted(set(my_list), key=my_list.index)
# 1 loop, best of 3: 716 ms per loop
Solution 3:[3]
If you really don't care about optimizations and stuff you can use the following:
s = ['Herb', 'Alec', 'Herb', 'Don']
[x[0] for x in zip(s, range(len(s))) if x[0] not in s[:x[1]]]
Note that in my opinion you really should use the
forloop in your question or the answer by @juanpa.arrivillaga
Solution 4:[4]
You can try this:
l = ['Herb', 'Alec', 'Herb', 'Don']
data = [i[-1] for i in sorted([({a:i for i, a in enumerate(l)}[a], a) for a in set({a:i for i, a in enumerate(l)}.keys())], key = lambda x: x[0])]
Output:
['Alec', 'Herb', 'Don']
This algorithm merely removes the first instance of a duplicate value.
Solution 5:[5]
l_new = []
for item in l_old:
if item not in l_new: l_new.append(item)
In one line..ish:
l_new = []
[ l_new.append(item) for item in l_old if item not in l_new]
Which has the behavior:
> a = [1,1,2,2,3,3,4,5,5]
> b = []
> [ b.append(item) for item in a if item not in b]
> print(b)
[1,2,3,4,5]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Stefan Pochmann |
| Solution 2 | Olivier |
| Solution 3 | Dekel |
| Solution 4 | Ajax1234 |
| Solution 5 |
