'How to remove both number and text from a parenthesis using regrex in python?

In the following text, I want to remove everything inside the parenthesis including number and string. I use the following syntax but I got result of 22701 instead of 2270. What would be a way to show 2270 only using re.sub? Thanks

import regex as re
import numpy as np
import pandas as pd

text = "2270 (1st xyz)"
text_new = re.sub(r"[a-zA-Z()\s]","",text)
text_new


Solution 1:[1]

Simply use the regex pattern \(.*?\):

import re

text = "2270 (1st xyz)"
text_new = re.sub("\(.*?\)", "", text)
print(text_new)

Output:

2270 

Explanation on the pattern \(.*?\):

  • The \ behind each parenthesis is to tell re to treat the parenthesis as a regular character, as they are by default special characters in re.
  • The . matches any character except the newline character.
  • The * matches zero or more occurrences of the pattern immediately specified before the *.
  • The ? tells re to match as little text as possible, thus making it non-greedy.

Note the trailing space in the output. To remove it, simply add it to the pattern:

import re

text = "2270 (1st xyz)"
text_new = re.sub(" \(.*?\)", "", text)
print(text_new)

Output:

2270

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1