'How to find all possible uniform substrings of a string?
I have a string like
aaabbbbcca
And I'd like to parse all possible uniform substrings from that. So my expected substrings for this string are
['a', 'aa', 'aaa', 'b', 'bb', 'bbb', 'bbbb', 'c', 'cc', 'a']
I tried the following
import re
print(re.findall(r"([a-z])(?=\1*)", "aaabbbbcca"))
# Output: ['a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'a']
Is it possible trough regular expressions? If yes, then how?
Solution 1:[1]
You can use a regex to find streaks of the same character, and then some Python on top to build the smaller streaks.
import re
s = 'aaabbbbcca'
matches = (m.group() for m in re.finditer(r'([a-z])\1*', s))
result = [m[:i] for m in matches for i in range(1, len(m) + 1)]
There's also an itertools solution.
from itertools import groupby
s = 'aaabbbbcca'
matches = (''.join(g) for _, g in groupby(s))
result = [m[:i] for m in matches for i in range(1, len(m) + 1)]
Solution 2:[2]
Using two itertools functions:
from itertools import groupby, accumulate
s = 'aaabbbbcca'
print([a for _, g in groupby(s) for a in accumulate(g)])
Or just with basics:
s = 'aaabbbbcca'
a = ''
print([a := a * (c in a) + c for c in s])
Output for both:
['a', 'aa', 'aaa', 'b', 'bb', 'bbb', 'bbbb', 'c', 'cc', 'a']
Solution 3:[3]
I think this particular problem can be solved with a regex. The answer is based on this answer, where parts of numbers are extracted. The explanation is the same as in the other answer. Each match creates an empty group and a group within the lookahead. The lookahead captures sequences of a, b or c of at least length 1. Afterward, we simply create a list of strings that are in the second group.
import re
s = "aaabbbbcca"
matches = re.finditer(r'(?=(a{1,}|b{1,}|c{1,}))',s)
results = [match.group(1) for match in matches]
print(results)
Output:
['aaa', 'aa', 'a', 'bbbb', 'bbb', 'bb', 'b', 'cc', 'c', 'a']
The values of the output are the same as requested, but not the exact same order.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 |
