'Re Regular expression operations, remove periods?
I'm working with a function I made to split this sample line below to remove the standalone numerical values (123), however it's also removing the trailing numbers which I need. I also can't figure out how to remove the "0.0"
ABC/0.0/123/TT1/1TT//
cleaned_data = []
def split_lines(lines, delimiter, remove = '[0-9]+$'):
for line in lines:
tokens = line.split(delimiter)
tokens = [re.sub(remove, "", token) for token in tokens]
clean_list = list(filter(lambda e:e.strip(), tokens))
cleaned_data.append(clean_list)
print(clean_list)
split_lines(lines, "/")
What's coming out now is below, notice the 0. and "TT" that's missing the trailing 1.
[ABC], [0.], [TT], [1TT]
Solution 1:[1]
Try including the start of line anchor (^) as well.
cleaned_data = []
def split_lines(lines, delimiter, remove = '^[0-9.]+$'):
for line in lines:
tokens = line.split(delimiter)
tokens = [re.sub(remove, "", token) for token in tokens]
clean_list = list(filter(lambda e:e.strip(), tokens))
cleaned_data.append(clean_list)
print(clean_list)
split_lines(lines, "/")
I simply changed the default value of the remove parameter to '^[0-9.]+$' which only matches if the entire search string is numbers (or a period).
Solution 2:[2]
Do you really need regular expressions? This job is much simpler if you just use str.split() and try to convert the resulting values to float:
def split_lines_remove_numeric(lines, delimiter):
for line in lines:
clean_list = []
for item in line.split(delimiter):
if not item: continue # Skip this item if it's empty
try:
# Convert to float
float(item)
except ValueError: # Enter this block if conversion threw an error
clean_list.append(item)
print(clean_list)
Then, calling this function removes the values you want:
>>> split_lines_remove_numeric(["ABC/0.0/123/TT1/1TT//"], "/")
['ABC', 'TT1', '1TT']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mark |
| Solution 2 | Pranav Hosangadi |
