'regex, string contains several specific characters, exactly k times
I have a list of words, and I want to filter them based on specific characters and the number of time each character has to appear, in no particular order. All other characters can appear any number of times. For exmaple,
Filter all the words that contain the letter "a" exactly 1 time, and the letter "b" exactly 2 times.
"bbad" or "bxab" should match, "bbaad" should not.
I currently arrived to this regex which doesn't specify the number of times each character appears:
\b(?=[^\Wa]*a)(?=[^\Wb]*b)\w+\b
I tried:
\b(?=[^\Wa]{1})(?=[^\Wb]{2})\w+\b
but that doesn't work. Another thing is I want the regex to be somewhat modular, because the desired characters are determined in running time.
Thank you for your time and help!
Solution 1:[1]
One could use the following regular expression (which could be constructed programmatically) to match words that contain exactly one 'a' and two 'b''s.
\b(?=[^b]*(?:b[^b]*){2}\b)(?=[^a]*a[^a]*\b)\w*
If it were required that the word contained three 'b''s, rather than two, one would change {2} to {3}.
The regular expression can be broken down as follows.
\b # match a word boundary
(?= # begin a positive lookahead
[^b]* # match >= 0 chars other than 'b'
(?:b[^b]*) # match 'b' followed by >= 0 chars other than 'b' in
# a non-capture group
{2} # execute the non-capture group twice
\b # match a word boundary
) # end positive lookahead
(?= # begin a positive lookahead
[^a]* # match >= 0 chars other than 'a'
a # match 'a'
[^a]* # match >= 0 chars other than 'a'
\b # match a word boundary
) # end positive lookahead
\w* # match >= 0 word chars
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
