'Optimizing finding a string that matches the characters in a substring in any order?
Assuming a list as follows:
list_of_strings = ['foo', 'bar', 'soap', 'sseo', 'spaseo', 'oess']
and a sub string
to_find = 'seos'
I would like to find the string(s) in the list_of_strings that:
- Have the same length as
to_find - Have the same characters as
to_find(irresepective of the order of the characters)
The output from the list_of_strings should be 'sseo', 'oess'] (since it has all the letters from to_find & all have a length of 4)
I have:
import itertools
list_of_strings = [string for string in list_of_strings if len(string) == len(to_find)]
result = [string for string in list_of_strings if any("".join(perm) in string for perm in itertools.permutations(to_find))]
To find how long does it take to run the code I did
import timeit
timeit.timeit("[string for string in list_of_strings if any(''.join(perm) in string for perm in itertools.permutations(to_find))]",
setup='from __main__ import list_of_strings, to_find', number=100000)
The process takes a while to give the output. I am guessing it is because of the use of itertools.permutations.
Is there a way I can make this code more efficient?
Thanks
Solution 1:[1]
If order doesn't matter, you can just sort the strings and compare the resulting lists:
list_of_strings = ['foo', 'bar', 'soap', 'sseo', 'spaseo', 'oess']
to_find = sorted('seos')
matches = [word for word in list_of_strings if sorted(word) == to_find]
Solution 2:[2]
This should work because Counter creates a dict-like that counts the number of characters in each string and the aim is to match the letters and their counts irrespective of their orders.
from collections import Counter
to_find_counter = Counter(to_find)
# go through the list and check if the Counter is the same as the Counter of to_find
[x for x in list_of_strings if Counter(x)==to_find_counter]
['sseo', 'oess']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | LukasNeugebauer |
| Solution 2 | not a robot |
