'How do i remove items, that don't only consist of certain characters, from a list?
I need to remove the items that contain other characters than "-" and "." from a random list.
For example:
I have this list:
['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-']
An item in the list can only consist of "-" and "." , so the output needs to be :
['.-', '-...', '-.-.', '-..', '.', '.---', '-.-']
If we take another random list:
[".-","-...","-.-.","-..",".","..-. teveel kolommen",".---"]
Then, this output needs to be:
[".-","-...","-.-.","-..",".",".---"]
Can someone please explain to me how I can do this without using a function?
Solution 1:[1]
Use set operations:
>>> [s for s in lst if set(s).issubset(set(".-"))]
Examples:
lst = ['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-']
>>> [s for s in lst if set(s).issubset(set(".-"))]
['.-', '-...', '-.-.', '-..', '.', '.---', '-.-']
lst = [".-","-...","-.-.","-..",".","..-. teveel kolommen",".---"]
>>> [s for s in lst if set(s).issubset(set(".-"))]
['.-', '-...', '-.-.', '-..', '.', '.---']
Solution 2:[2]
Benchmarks of more versions:
['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-']
mean stdev (from best 5 of 50 attempts)
1.62 ?s 0.00 ?s filter__issuperset
1.71 ?s 0.00 ?s filterfalse__re_search
2.14 ?s 0.00 ?s listcomp__count
2.27 ?s 0.00 ?s filter__re_fullmatch
2.29 ?s 0.00 ?s filter__re_match
2.41 ?s 0.00 ?s filter__re_search
2.66 ?s 0.00 ?s compress__count
2.70 ?s 0.00 ?s listcomp__issubset
4.70 ?s 0.00 ?s listcomp__not_re_search
5.38 ?s 0.00 ?s listcomp__re_fullmatch
5.57 ?s 0.00 ?s listcomp__re_search
['.-', '-...', '-.-.', '-..', '.', '..-. teveel kolommen', '.---']
mean stdev (from best 5 of 50 attempts)
1.64 ?s 0.00 ?s filterfalse__re_search
1.66 ?s 0.00 ?s filter__issuperset
1.94 ?s 0.00 ?s listcomp__count
2.08 ?s 0.00 ?s filter__re_fullmatch
2.10 ?s 0.00 ?s filter__re_match
2.38 ?s 0.00 ?s filter__re_search
2.48 ?s 0.00 ?s compress__count
2.65 ?s 0.00 ?s listcomp__issubset
4.21 ?s 0.00 ?s listcomp__not_re_search
4.78 ?s 0.00 ?s listcomp__re_fullmatch
5.13 ?s 0.00 ?s listcomp__re_search
cpython 3.10.4 (main, Apr 13 2022, 16:06:53) [GCC 10.2.1 20210110]
The issuperset solution should become even faster in Python 3.11 since that has finally been optimized (or rather un-deoptimized... I've been thinking for a long time that it shouldn't turn the whole iterable into a set first).
Code (Try it online!):
def listcomp__issubset(lst):
return [s for s in lst if set(s).issubset(set(".-"))]
def filter__issuperset(lst):
return [*filter(set('.-').issuperset, lst)]
def listcomp__re_search(lst):
return [s for s in lst if re.search(r'^[-.]*$', s)]
def listcomp__re_fullmatch(lst):
return [s for s in lst if re.fullmatch(r'[-.]*', s)]
def listcomp__not_re_search(lst):
return [s for s in lst if not re.search(r'[^-.]', s)]
def filter__re_search(lst):
return [*filter(re.compile(r'^[-.]*$').search, lst)]
def filter__re_fullmatch(lst):
return [*filter(re.compile(r'[-.]*').fullmatch, lst)]
def filter__re_match(lst):
return [*filter(re.compile(r'[-.]*$').match, lst)]
def filterfalse__re_search(lst):
return [*filterfalse(re.compile(r'[^-.]').search, lst)]
def listcomp__count(lst):
return [s for s in lst if s.count('.') + s.count('-') == len(s)]
def compress__count(lst):
dots = map(str.count, lst, repeat('.'))
hyps = map(str.count, lst, repeat('-'))
return [*compress(lst, map(eq, map(add, dots, hyps), map(len, lst)))]
funcs = [
listcomp__issubset,
filter__issuperset,
listcomp__re_search,
listcomp__re_fullmatch,
listcomp__not_re_search,
filter__re_search,
filter__re_fullmatch,
filter__re_match,
filterfalse__re_search,
listcomp__count,
compress__count,
]
from timeit import timeit
from random import shuffle
from bisect import insort
from statistics import mean, stdev
import re
from itertools import filterfalse, compress, repeat
from operator import add, eq
import sys
def test(lst, expect):
print(lst)
for func in funcs:
result = func(lst)
assert result == expect, func.__name__
times = {func: [] for func in funcs}
for _ in range(50):
shuffle(funcs)
for func in funcs:
number = 1000
t = timeit(lambda: func(lst), number=number) / number
insort(times[func], t)
print(' mean stdev (from best 5 of 50 attempts)')
for func in sorted(funcs, key=times.get):
ts = times[func]
del ts[5:]
print(*('%.2f ?s ' % (t * 1e6) for t in [mean(ts), stdev(ts)]),
func.__name__)
print()
test(['.-', '-...', '-.-.', '-..', '.', '.p..', '.---', '-.-'],
['.-', '-...', '-.-.', '-..', '.', '.---', '-.-'])
test([".-","-...","-.-.","-..",".","..-. teveel kolommen",".---"],
[".-","-...","-.-.","-..",".",".---"])
print(sys.implementation.name, sys.version)
Solution 3:[3]
Use re.search with a regular expression:
import re
new_lst = [s for s in lst if re.search(r'^[-.]*$', s)]
Here, ^ is the start of the string, $ is the end of the string, [-.] is a character class that consists of 2 characters (dash and period), and * is the multiplier that says: repeat the previous item 0 or more times.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | not_speshal |
| Solution 2 | |
| Solution 3 |
