'Is it possible to write Python regexp with something like AND operator?
I can't find a nice way to write several regexp into one such that input string is checked against all sub-regexps like this:
def match(input_str: str, regexp: str) -> bool:
...
print(match('abaaca', '.*aba.*<AND>.*aca.*')) # True
print(match('abaca', '.*aba.*<AND>.*aca.*')) # True, it doesn't matter that one letter a is shared
print(match('abac', '.*aba.*<AND>.*aca.*'). # False
Is there any way to do it better than parsing regexp to see if there is <AND> in it, split the string into several sub-regexps and match in cycle?
UPD: to be clear, I am looking for a way to use it as a full-featured operator, in cases like ((a<AND>b)|(c<AND>d))<AND>e which will match all of the strings abe, bae, cde and dce. Not only one <AND> but several, mixed with parentheses.
Solution 1:[1]
A Regex solution
using positive lookahead groups (?=<sub>) which prevent characters to be consumed
import re
def match(input_str: str, regexp: str) -> bool:
return re.search("".join([f"(?={sub})" for sub in re.split('<AND>', regexp)]), input_str) != None
print(match('abaaca', '.*aba.*<AND>.*aca.*')) # True
print(match('abaca', '.*aba.*<AND>.*aca.*')) # True, it doesn't matter that one letter a is shared
print(match('abac', '.*aba.*<AND>.*aca.*')) # False
=>
True
True
False
The oneliner is equivalent to
def match(input_str: str, regexp: str):
subs = re.split('<AND>', regexp) # getting the sub patterns
# next 3 lines create a pattern from the sub patterns
pattern = ""
for sub in subs:
pattern = pattern + "(?=" + sub + ")" # positive lookahead syntax
matches = re.search(pattern, input_str)
return matches != None
For the example pattern '.*aba.*<AND>.*aca.*' the modified pattern is (?=.*aba.*)(?=.*aca.*)
Solution 2:[2]
You can use a function to check do all patterns match the string
import re
def matchall(patterns, string):
return all([re.search(pattern, string) for pattern in patterns])
print(matchall([".*aba.*", ".*aca.*"], "abaaca")) # True
Edit: 10.06.2022
Using regex lookahead
(?=.*aba.*)(?=.*aca.*).*
Explanation
(?=Lookahead assertion - assert that the following regex matches.*aba.*Match but not capture the.*aba.*substrings
)Close lookahead(?=Lookahead assertion - assert that the following regex matches.*aca.*Match but not capture the.*aca.*substrings
)Close lookahead.*Match the whole string if the previous lookarounds both matched
See the regex demo
Solution 3:[3]
it doesn't matter that one letter a is shared
No, you can't do this with only one regex. From the documentation for match():
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object.
In other words, the entire regex must match at the beginning of the string. Even if you changed to re.search(), you will still have to match the entire regex somewhere in the input string. And re.findall() searches for non-overlapping matches.
Solution 4:[4]
basically the regex checks strings for aba.*aca OR aca.*aba. the lookbehind is necessary because there might be an a that's part of both subpatterns
import re
regex = r"aba.*(?<=a)ca|aca.*(?<=a)ba"
for s in ['abaaca', 'abaca', 'abac', 'aaacacabbaba', 'abababaca', 'abbbacaaaba']:
print(s, '=>', bool(re.search(regex, s)))
output:
abaaca => True
abaca => True
abac => False
aaacacabbaba => True
abababaca => True
abbbacaaaba => True
Solution 5:[5]
Building on Artyom Vancyan answer I would iterate over a list of compiled regular expressions as it will give you a big performance gain if the function is called many times.
import re
expressions = [re.compile(r'abaaca'), re.compile(r'abaca'), re.compile(r'abac')]
def match_expressions(expressions, string_to_match):
return all([expression.search(string_to_match) for expression in expressions])
Solution 6:[6]
import re
def match(input_str: str, regexp: str) -> bool:
pattern = "".join(
[f"(?={condition})" for condition in regexp.split("<AND>")]
)
return bool(re.findall(pattern, input_str))
print(match("abaaca", ".*aba.*<AND>.*aca.*")) # True
print(match("abaca", ".*aba.*<AND>.*aca.*")) # True, it doesn't matter that one letter a is shared
print(match("abac", ".*aba.*<AND>.*aca.*")) # False
Solution 7:[7]
The following pattern matches almost all.
# Regex If order is important, i.e. should start with aba
pattern = r'.*ab(a.*a|a)ca.*'
# Regex If order is not important, i.e. It can start with aba | aca
pattern = r'.*a(b(a.*a|a)c|c(a.*a|a)b)a.*'
# OUTPUTS
#False inputs
string = ['abac','aba_ca','acab','_ab_ca_','acab','aca ba','_ababa_test_aba_']
print(re.search(pattern, string[0])) # O/P False
# True inputs
string = ["abaca",'acaba','aca_test_aba','_aba_test_aca_','acaaba','abaaca']
print(re.search(pattern, string[0])) # O/P True
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Diego Queiroz |
| Solution 2 | |
| Solution 3 | Code-Apprentice |
| Solution 4 | |
| Solution 5 | Joaquim Procopio |
| Solution 6 | |
| Solution 7 |
