'regular expression to split a string with comma outside parentheses with more than one level python
I have a string like this in python
filter="eq(Firstname,test),eq(Lastname,ltest),OR(eq(ContactID,12345),eq(ContactID,123456))"
rx_comma = re.compile(r"(?:[^,(]|\([^)]*\))+")
result = rx_comma.findall(filter)
Actual result is:
['eq(Firstname,test)', 'eq(Lastname,ltest)', 'OR(eq(ContactID,12345)', 'eq(ContactID,123456))']
Expected result is:
['eq(Firstname,test)', 'eq(Lastname,ltest)', 'OR(eq(ContactID,12345),eq(ContactID,123456))']
Any help is appreciated.
Solution 1:[1]
The OP's issue was already solved by using the regex module though, I'd like to introduce pyparsing as an alternative solution here. It can be installed by the following command:
pip install pyparsing
Code:
import pyparsing as pp
s = "eq(Firstname,test),eq(Lastname,ltest),OR(eq(ContactID,12345),eq(ContactID,123456))"
expr = pp.delimited_list(pp.original_text_for(pp.Regex(r'.*?(?=\()') + pp.nested_expr('(', ')')))
output = expr.parse_string(s).as_list()
assert output == ['eq(Firstname,test)', 'eq(Lastname,ltest)', 'OR(eq(ContactID,12345),eq(ContactID,123456))']
Explanation:
The key point is the expr in the above code. I added some explanatory comments to its definition as follows:
pp.delimited_list( # Separate a given string at the default comma delimiter
pp.original_text_for( # Get original text instead of completely parsed elements.
pp.Regex(r'.*?(?=\()') # Search everything before the first opening parenthesis '('
+ pp.nested_expr('(', ')') # Parse nested parentheses
)
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | quasi-human |
