'REGEX_String between strings in a list
From this list:
['AUSTRALIA\nBELMONT PARK (WA', '\nR3\n1/5/4/2\n2/3/1/5\nEAGLE FARM (QLD']
I would like to reduce it to this list:
['BELMONT PARK', 'EAGLE FARM']
You can see from the first list that the desired words are between '\n'
and '('
.
My attempted solution is:
for i in x:
result = re.search('\n(.*)(', i)
print(result.group(1))
This returns the error 'unterminated subpattern'. Thankyou
Solution 1:[1]
You’re getting an error because the (
is unescaped. Regardless, it will not work, as you’ll get the following matches:
\nBELMONT PARK (
\nR3\n1/5/4/2\n2/3/1/5\nEAGLE FARM (
You can try the following:
(?<=\\n)(?!.*\\n)(.*)(?= \()
(?<=\\n)
: Positive lookbehind to ensure\n
is before match(?!.*\\n)
: Negative lookahead to ensure no further\n
is included(.*)
: Your match(?= \()
: Positive lookahead to ensure(
is after match
Solution 2:[2]
You can get the matches without using any lookarounds, as you are already using a capture group.
\n(.*) \(
Explanation
\n
Match a newline(.*)
Capture group 1, match any character except a newline, as much as possible\(
Match a space and(
See a regex101 demo and a Python demo.
Example
import re
x = ['AUSTRALIA\nBELMONT PARK (WA', '\nR3\n1/5/4/2\n2/3/1/5\nEAGLE FARM (QLD']
pattern = r"\n(.*) \("
for i in x:
m = re.search(pattern, i)
if m:
print(m.group(1))
Output
BELMONT PARK
EAGLE FARM
If you want to return a list:
x = ['AUSTRALIA\nBELMONT PARK (WA', '\nR3\n1/5/4/2\n2/3/1/5\nEAGLE FARM (QLD']
pattern = r"\n(.*) \("
res = [m.group(1) for i in x for m in [re.search(pattern, i)] if m]
print(res)
Output
['BELMONT PARK', 'EAGLE FARM']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |