'Regex Expression to Find All Digits in a list and dashes

I am trying to convert this string '4-6,10-12,16' into a list that looks like this [4,"-",6,10,"-",12,16]. There would be a combination of integers and the special character "-" in the list.

I was trying to use a regex code in python but I could only do it to extract the numbers, however, I need the dashes as well in the list. How can I include dashes with numbers in the list?

Here is my code:

interval='4-6,10-12,16'
import re
l=[int(s) for s in re.findall(r'\b\d+\b', interval)]


Solution 1:[1]

Try this:

interval='4-6,10-12,16'
import re
l=[int(s) if s.isnumeric() else s for s in re.findall(r'\d+|-', interval)]
l

Output:

[4, '-', 6, 10, '-', 12, 16]

Solution 2:[2]

You can use

import re
interval='4-6,10-12,16'
l=[int(s) if all(c.isdigit() for c in s) else '-' for s in re.findall(r'\d+|-', interval)]
print(l) # => [4, '-', 6, 10, '-', 12, 16]

See the Python demo.

Details:

  • re.findall(r'\d+|-', interval) extracts digit sequences or - chars
  • int(s) if all(c.isdigit() for c in s) else '-' either casts a digit sequence to an int if the whole match consists of digits, or just returns - as a string.

Solution 3:[3]

Useful functions:

  • str.isdigit (or str.isnumeric or str.isdecimal);
  • itertools.groupby to group adjacent characters that share a characteristic.
from itertools import groupby

def tokenize_digits_and_dashes(s):
    for k, g in groupby(s, key=lambda c: (c.isdigit(), c == '-')):
        if k == (True, False):
            yield int(''.join(g))
        elif k == (False, True):
            yield '-'

print(list(tokenize_digits_and_dashes('4-6,10-12,16')))
# [4, '-', 6, 10, '-', 12, 16]

Alternative approach

Your string already contains separators in the form of commas ,. These are useful! Don't ignore them. You can split the list on the separators using str.split.

def tokenize_intervals(s):
    for interval in s.split(','):
        i = interval.split('-')
        if len(i) == 2:
            yield tuple(int(''.join(w)) for w in i)
        elif len(i) == 1:
            x = int(''.join(i[0]))
            yield (x, x)

print(list(tokenize_intervals('4-6,10-12,16')))
# [(4, 6), (10, 12), (16, 16)]

Solution 4:[4]

# By Using Regex #
# -------------- #

import re
interval = '4-6,10-12,16'
s_list = re.findall(r'[\d+]+|-', interval)
x = [int(_) if _.isnumeric() else _ for _ in s_list]
print(x)

# By Using the split method #
# ------------------------- #
final_list = []
for _ in interval.split(','):
    sub_list = _.split('-')
    for i in sub_list:
        if i.isnumeric():
            final_list.append(int(i))
        if sub_list[-1] != I:
            final_list.append('-')
print(final_list)

# By Checking Character By Character #
# ---------------------------------- #
z = ""
s = []
count = 0
for _ in interval:
    count += 1
    if _.isnumeric():
        z += _
        if count == len(interval):
            s.append(int(z))
    elif _ == '-':
        s.append(int(z))
        z = ""
        s.append('-')
    else:
        s.append(int(z))
        z = ""
print(s)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Wiktor Stribiżew
Solution 3
Solution 4 Pulakesh Dhara