'split string until a 5-7 digit number is found in python

I have strings like the following:

1338516 -...pair - 5pk 1409093 -...re Wax 3Pk
1409085 -...dtnr - 5pk 1415090 -...accessories
490663 - 3 pack 1490739 -...2 - 3 pack

What I'm trying to do is, split these strings so that the first string is 1338516 -...pair - 5pk and the second one is 1409093 -...re Wax 3Pk.

Currently, I'm able to extract the numbers using the following code:

list(filter(lambda k: '...' in k, reqText))
lst1 = ''.join(lst)
numbers = re.findall(r'\d+', lst1)
numbers1 = [x for x in numbers if len(x) > 3]

Any suggestions?

Solution 1:^[1]

You could use split with a pattern:

[^\S\n]+(?=\d{5,7}\b)

Explanation

[^\S\n]+ Match 1 or more spaces without a newline
(?=\d{5,7}\b) Positive lookahead, assert 5-7 digits to the right followed by a word boundary

Regex demo

import re

pattern = r"[^\S\n]+(?=\d{5,7}\b)"

lst = [
    "1338516 -...pair - 5pk 1409093 -...re Wax 3Pk",
    "1409085 -...dtnr - 5pk 1415090 -...accessories",
    "490663 - 3 pack 1490739 -...2 - 3 pack"
]

for s in lst:
    print(re.split(pattern, s))

Output

['1338516 -...pair - 5pk', '1409093 -...re Wax 3Pk']
['1409085 -...dtnr - 5pk', '1415090 -...accessories']
['490663 - 3 pack', '1490739 -...2 - 3 pack']

Another option could be a matching approach:

\b\d{5,7}\b.*?(?=[^\S\n]+\d{5,7}\b|$)

Regex demo

Solution 2:^[2]

You can use

^(.+?)\s*\b(\d{5,7}\b.*)

See the regex demo.

In Python, use a raw string literal to declare this regex:

pattern = r'^(.+?)\s*\b(\d{5,7}\b.*)'

Details:

^ - start of string
(.+?) - Group 1: one or more (but as few as possible) occurrences of any char other than line break chars
\s* - zero or more whitespaces
\b - a word boundary
(\d{5,7}\b.*) - Group 2: five-seven digit number, word boundary and the rest of the line.

See a Python demo:

import re
text = "1338516 -...pair - 5pk 1409093 -...re Wax 3Pk"
pattern = r'^(.+?)\s*\b(\d{5,7}\b.*)'
m = re.search(pattern, text)
if m:
    print(m.group(1)) # => 1338516 -...pair - 5pk
    print(m.group(2)) # => 1409093 -...re Wax 3Pk

If you need to use it in a Pandas dataframe, you can use

df[['result_col_1', 'result_col_2']] = df['source'].str.extract(pattern, expand=True)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2

'split string until a 5-7 digit number is found in python

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]