'Regex Expression to Find All Digits in a list and dashes
I am trying to convert this string '4-6,10-12,16' into a list that looks like this [4,"-",6,10,"-",12,16]. There would be a combination of integers and the special character "-" in the list.
I was trying to use a regex code in python but I could only do it to extract the numbers, however, I need the dashes as well in the list. How can I include dashes with numbers in the list?
Here is my code:
interval='4-6,10-12,16'
import re
l=[int(s) for s in re.findall(r'\b\d+\b', interval)]
Solution 1:[1]
Try this:
interval='4-6,10-12,16'
import re
l=[int(s) if s.isnumeric() else s for s in re.findall(r'\d+|-', interval)]
l
Output:
[4, '-', 6, 10, '-', 12, 16]
Solution 2:[2]
You can use
import re
interval='4-6,10-12,16'
l=[int(s) if all(c.isdigit() for c in s) else '-' for s in re.findall(r'\d+|-', interval)]
print(l) # => [4, '-', 6, 10, '-', 12, 16]
See the Python demo.
Details:
re.findall(r'\d+|-', interval)extracts digit sequences or-charsint(s) if all(c.isdigit() for c in s) else '-'either casts a digit sequence to anintif the whole match consists of digits, or just returns-as a string.
Solution 3:[3]
Useful functions:
str.isdigit(orstr.isnumericorstr.isdecimal);itertools.groupbyto group adjacent characters that share a characteristic.
from itertools import groupby
def tokenize_digits_and_dashes(s):
for k, g in groupby(s, key=lambda c: (c.isdigit(), c == '-')):
if k == (True, False):
yield int(''.join(g))
elif k == (False, True):
yield '-'
print(list(tokenize_digits_and_dashes('4-6,10-12,16')))
# [4, '-', 6, 10, '-', 12, 16]
Alternative approach
Your string already contains separators in the form of commas ,. These are useful! Don't ignore them. You can split the list on the separators using str.split.
def tokenize_intervals(s):
for interval in s.split(','):
i = interval.split('-')
if len(i) == 2:
yield tuple(int(''.join(w)) for w in i)
elif len(i) == 1:
x = int(''.join(i[0]))
yield (x, x)
print(list(tokenize_intervals('4-6,10-12,16')))
# [(4, 6), (10, 12), (16, 16)]
Solution 4:[4]
# By Using Regex #
# -------------- #
import re
interval = '4-6,10-12,16'
s_list = re.findall(r'[\d+]+|-', interval)
x = [int(_) if _.isnumeric() else _ for _ in s_list]
print(x)
# By Using the split method #
# ------------------------- #
final_list = []
for _ in interval.split(','):
sub_list = _.split('-')
for i in sub_list:
if i.isnumeric():
final_list.append(int(i))
if sub_list[-1] != I:
final_list.append('-')
print(final_list)
# By Checking Character By Character #
# ---------------------------------- #
z = ""
s = []
count = 0
for _ in interval:
count += 1
if _.isnumeric():
z += _
if count == len(interval):
s.append(int(z))
elif _ == '-':
s.append(int(z))
z = ""
s.append('-')
else:
s.append(int(z))
z = ""
print(s)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Wiktor Stribiżew |
| Solution 3 | |
| Solution 4 | Pulakesh Dhara |
