'Regular Expression for split multiple strings with different pattern

I am trying to split strings into multiple strings using regex. I have strings like the following:

'1. 10.25% 2. 11% 3. 9.75% 4. 4.3%'
'1.promising.2.inappropriately3.essential.4.intense.'
'1. He has not been attending 2. English classes 3. since one month4. No error'
'1. X got 15 shares2. B got 25 shares3. W got 54. shares.4. Mark got 2.5 shares'

I am expecting output like this:

'1. X got 15 shares' '2. B got 25 shares', '3. W got 54. shares.', '4. Mark got 2.5 shares'
'1. 10.25%'
'2. 11% '
'3. 9.75%'
' 4. 4.3%'

I want to write a single expression that split all the given scenarios. I tried writing the following expression but it fails in some cases

re.split(r'(?=[1-9]{1}\.[\s]?[a-zA-Z0-9\.\:\(\)\-\,\% ]+)', string)

python regex

Solution 1:^[1]

I'd suggest looking for each subsequent number used in a (?<!\d)NUM\. (the NUM with a . right after and no other preceding digit) pattern and split at those positions only:

import re

texts = ['1. 10.25% 2. 11% 3. 9.75% 4. 4.3%',
'1.promising.2.inappropriately3.essential.4.itense.',
'1. He has not been attending 2. English classes 3. since one month4. No error',
'1. X got 15 shares2. B got 25 shares3. W got 54. shares.4. Mark got 2.5 shares']

pattern = r'(?<!\d){}\.'
for text in texts:
    bps = []
    prev = 0
    for i in range(1,1000):
        rx = re.compile(pattern.format(i))
        m = rx.search(text, prev)
        if m:
            if prev != m.start():
                bps.append(text[prev:m.start()].strip())
            prev = m.start()
        else:
            break
    if prev < len(text) - 1:
        bps.append(text[prev:].strip())
    print(bps)

See the Python demo.

Output:

['1. 10.25%', '2. 11%', '3. 9.75%', '4. 4.3%']
['1.promising.', '2.inappropriately', '3.essential.', '4.itense.']
['1. He has not been attending', '2. English classes', '3. since one month', '4. No error']
['1. X got 15 shares', '2. B got 25 shares', '3. W got 54. shares.', '4. Mark got 2.5 shares']

Note the rx = re.compile(pattern.format(i)) and m = rx.search(text, prev) lines: the pattern is compiled since the Pattern.search method allows searching from the specified position, which is the previous match start position.

The range(1,1000) part can be adjusted, 1000 assumes you have up to 999 bullet points in the text.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Wiktor StribiÅ¼ew

'Regular Expression for split multiple strings with different pattern

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]