'Python re, how to capture 12"" / 14""

I need to capture patterns like this one:

12"" / 14""

in

"Factory SP1 150 12"" / 14"""

The numbers change (always 2 digits), the rest doesn't.
Note that the double quotes at the ends of the string are part of the string and not enclosers.

Also note that I'm working with pandas and using .str.extract(pattern).

My code:

df = pd.read_csv(r'filename.csv', delimiter = ';', usecols = ["OLD_COLUMN", "OTHER_COLUMNS"], encoding='utf-8', error_bad_lines=False)

pattern = r'(\d{2}""\s*/\s*\d{2}"")'

df["NEW_COLUMN"] = df["OLD_COLUMN"].str.extract(pattern)

I changed groups, tried to escape every character. I can't find a way.



Solution 1:[1]

You can use r'\d{2}""\s*/\s*\d{2}""' as regex:

s = '"Factory SP1 150 12"" / 14"""'
re.findall(r'\d{2}""\s*/\s*\d{2}""', s)

output:

['12"" / 14""']

Be careful with your strings: "Factory SP1 150 12"" / 14""" is equivalent to: "Factory SP1 150 12" + " / 14" + "" so 'Factory SP1 150 12 / 14'

Solution 2:[2]

pattern = '([0-9]+""\s*/\s*[0-9]+"")'

Is a regex that will match that along with other expressions like 1351""/1"". The issue is your use of the r or raw string. It causes your \ in the pattern to be interpreted as literally \. So your original pattern would only match strings like 12\"\" / 14\"\"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway
Solution 2 Chad S.