'regex. group front and surname from the Pathstring with Python

I need to extract the Names from the following strings (folder_names). I made them into raw strings. some examples:

'.\\\\Jens, Jensen\\\\Rechnungen\\\\Rechnungen 2020\\\\somefoldername'
'.\\Harald, Hardraala\\Rechnungen 2017'
'.\\A - H\\Johan, Johanson\\Rechnungen 2017'
'.\\\\Jens-Haudraf, Johan\\\\Rechnungen\\\\Rechnungen 2020\\\\anotherfoldername'
'.\\A - H\\Funke, Felix'

I want the Names in one group. I can't do it. This is what I came up with

r'\\*(\w*\-{0,1},{0,1} {0,1}\w*)'


Solution 1:[1]

The following code will extract the names assuming the format remains the same i.e. one word name (possibly with hyphen) + comma + another one word name (possibly with hyphen).

import re
strings = ['.\\\\Jens, Jensen\\\\Rechnungen\\\\Rechnungen 2020\\\\somefoldername',
'.\\Harald, Hardraala\\Rechnungen 2017',
'.\\A - H\\Johan, Johanson\\Rechnungen 2017',
'.\\\\Jens-Haudraf, Johan\\\\Rechnungen\\\\Rechnungen 2020\\\\anotherfoldername',
'.\\A - H\\Funke, Felix']

matches = [ re.search("[\w-]+, [\w-]+",s).group() for s in strings ]

print(matches)
>>>
['Jens, Jensen', 'Harald, Hardraala', 'Johan, Johanson', 'Jens-Haudraf, Johan', 'Funke, Felix']

Solution 2:[2]

You could match a backslash followed by word characters with an optional hyphened part. Then match a space and again word characters.

The value is in the first capturing group.

Pattern

\\(\w+(?:-\w+)?, \w+)

In parts

  • \\ Match \
  • ( Capture group 1
    • \w+(?:-\w+)? Match 1+ word chars with an optional - and 1+ word chars
    • , \w+ Match a comma, space and 1+ word chars
  • ) Close group 1

Regex demo | Python demo

Example code

import re

regex = r"\\(\w+(?:-\w+)?, \w+)"
strings = [
    '.\\\\Jens, Jensen\\\\Rechnungen\\\\Rechnungen 2020\\\\somefoldername',
    '.\\Harald, Hardraala\\Rechnungen 2017',
    '.\\A - H\\Johan, Johanson\\Rechnungen 2017',
    '.\\\\Jens-Haudraf, Johan\\\\Rechnungen\\\\Rechnungen 2020\\\\anotherfoldername',
    '.\\A - H\\Funke, Felix'
]

for s in strings:
    matches = re.search(regex, s)
    if matches:
        print(matches.group(1))

Output

Jens, Jensen
Harald, Hardraala
Johan, Johanson
Jens-Haudraf, Johan
Funke, Felix

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Patrick von Glehn
Solution 2