'Find indices of substrings within overlapping brackets [duplicate]
I want to extract and locate the words within all brackets/braces in a sentence, but I am currently having trouble with overlapping brackets. e.g.:
[in]: sentence = '{ia} ({fascia} antebrachii). Genom att aponeurosen fäster i armb'
[in]: pattern = r"\[([^\[\]()]+?)\]|\(([^\[\]()]+?)\)|\{([^\[\]()]+?)\}"
[in]: [(m.start(0), m.end(0), sentence[m.start(0) : m.end(0)]) for m in re.finditer(pattern, sentence)]
[out]: [(0, 4, '{ia}'), (5, 27, '({fascia} antebrachii)')]
It should identify 3 instances and correct indices. Any advice pls?
Solution 1:[1]
Try using the regex module. It can deal with overlapped strings:
import regex as re
sentence = '{ia} ({fascia} antebrachii). Genom att aponeurosen fäster i armb'
pattern = '{[^{}]+}|\[[^\[\]]+\]|\([^\(\)]+\)'
[(m.start(0), m.end(0), sentence[m.start(0) : m.end(0)]) for m in re.finditer(pattern, sentence, overlapped=True)]
There's also a simplified regex pattern, that matches...
- everything that is not a brace among braces:
{[^{}]+}, - everything that is not a bracket among brackets:
\[[^\[\]]+\] - everything that is not a parenthesis among parentheses:
\([^\(\)]+\)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | lemon |
