'REGEX_String between strings in a list

From this list:

['AUSTRALIA\nBELMONT PARK (WA', '\nR3\n1/5/4/2\n2/3/1/5\nEAGLE FARM (QLD']

I would like to reduce it to this list:

['BELMONT PARK', 'EAGLE FARM']

You can see from the first list that the desired words are between '\n' and '('.

My attempted solution is:

for i in x:
    result = re.search('\n(.*)(', i)
    print(result.group(1))

This returns the error 'unterminated subpattern'. Thankyou



Solution 1:[1]

You’re getting an error because the ( is unescaped. Regardless, it will not work, as you’ll get the following matches:

  • \nBELMONT PARK (
  • \nR3\n1/5/4/2\n2/3/1/5\nEAGLE FARM (

You can try the following:

(?<=\\n)(?!.*\\n)(.*)(?= \()
  • (?<=\\n): Positive lookbehind to ensure \n is before match
  • (?!.*\\n): Negative lookahead to ensure no further \n is included
  • (.*): Your match
  • (?= \(): Positive lookahead to ensure ( is after match

Solution 2:[2]

You can get the matches without using any lookarounds, as you are already using a capture group.

\n(.*) \(

Explanation

  • \n Match a newline
  • (.*) Capture group 1, match any character except a newline, as much as possible
  • \( Match a space and (

See a regex101 demo and a Python demo.

Example

import re

x = ['AUSTRALIA\nBELMONT PARK (WA', '\nR3\n1/5/4/2\n2/3/1/5\nEAGLE FARM (QLD']
pattern = r"\n(.*) \("

for i in x:
    m = re.search(pattern, i)
    if m:
        print(m.group(1))

Output

BELMONT PARK
EAGLE FARM

If you want to return a list:

x = ['AUSTRALIA\nBELMONT PARK (WA', '\nR3\n1/5/4/2\n2/3/1/5\nEAGLE FARM (QLD']
pattern = r"\n(.*) \("
res = [m.group(1) for i in x for m in [re.search(pattern, i)] if m]
print(res)

Output

['BELMONT PARK', 'EAGLE FARM']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2