'Regex: drop numbers with some symbols

I try to clean my text. So I need to remove some numbers and also some combinations of numbers and symbols.

I have a string

s = '4/13/2022 2:20:03 pm from our side a more detailed analysis4 +7 (495) 797-8700 77-8282'

And I want to get

'pm from our side a more detailed analysis4'

I tried to use

re.compile(r'\b(?:/|-|\+|\:)(\d+)\b').sub(r' ', s)

but it returns me

'4   2   pm from our side a more detailed analysis4 +7 (495) 797  77 '

What I do wrong and how can I drop just numbers and combinations of number and a specific symbol?



Solution 1:[1]

You might match at least a single non word character surrounded by optional digits and trim the result

(?<!\S)\d*(?:[^\w\s]+\d*)+\s*

Explanation

  • (?<!\S) Assert a whitspace boundary to the leeft
  • \d* Match optional digits
  • (?:[^\w\s]+\d*)+ Match 1+ times at least a non word character and optional digits
  • \s* Match optional whitespace chars

Regex demo

import re

pattern = r"(?<!\S)\d*(?:[^\w\s]+\d*)+\s*"
s = "4/13/2022 2:20:03 pm from our side a more detailed analysis4 +7 (495) 797-8700 77-8282 kl-1381033 substr1.substr2.ab-2021-44228.a"

print(re.sub(pattern, "", s))

Output

ppm from our side a more detailed analysis4 kl-1381033 substr1.substr2.ab-2021-44228.a

Solution 2:[2]

Try this expression :

(?:\/|-|\+|\:|^|\(|\)| ) ?(\d+)

You can test it there : https://regex101.com/r/aANxBR/1

Solution 3:[3]

It appears you want to remove words that start with digits and symbols.

You could do:

import re 

s = '4/13/2022 2:20:03 pm from our side a more detailed analysis4 +7 (495) 797-8700 77-8282 kl-1381033 substr1.substr2.ab-2021-44228.a'

>>> ' '.join(w for w in s.split() if not re.match(r'[\d(+]\S+', w))
'pm from our side a more detailed analysis4 kl-1381033 substr1.substr2.ab-2021-44228.a'

Including a completely Python solution:

bad_start='0123456789+('
>>> ' '.join(w for w in s.split() if w[0] not in bad_start)
'pm from our side a more detailed analysis4 kl-1381033 substr1.substr2.ab-2021-44228.a'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 LCMa
Solution 3