'Python - Return all substrings in the first group of nested parentheses
I want to find an efficient way to select all the sub-strings contained in the first group of nested parentheses from a string.
For example:
input: a d f gsds ( adsd ) adsdaa
output: ( adsd )
input: adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad )
output: ( sadad adsads ( adsda ) dsadsa )
intput: a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )
output: ( adad ( sad ) sdada asdad )
Notice there could be multiple groups of nested parentheses.
One solution would be scanning the string char by char and keeping track of the number of opened parentheses until (decreasing the number, once we have a closing parenthesis) the counter becomes 0 again.
I am wondering if there is a simpler way to do it? Maybe with regular expressions?
Thanks
Solution 1:[1]
You can use pyparsing to select all the sub-strings contained in the first group of nested parentheses from a string.
import pyparsing as pp
pattern = pp.Regex(r'.*?(?=\()') + pp.original_text_for(pp.nested_expr('(', ')'))
txt = 'a d f gsds ( adsd ) adsdaa'
result = pattern.parse_string(txt)[1]
assert result == '( adsd )'
txt = 'adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad )'
result = pattern.parse_string(txt)[1]
assert result == '( sadad adsads ( adsda ) dsadsa )'
txt = 'a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )'
result = pattern.parse_string(txt)[1]
assert result == '( adad ( sad ) sdada asdad )'
* pyparsing can be installed by pip install pyparsing
Note:
If a pair of parentheses gets broken inside () (for example a(b(c), a(b)c), etc), an unexpected result is obtained or IndexError is raised. So be careful. (See: Python extract string in a phrase)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | quasi-human |
