'Separate string by multiple separators and return separators and separated strings

I want to separate strings by separators constist of more than one char saved in the variable sep_list.

My aim then is to receive the last separated string s1 and the last separator which has s1 on his right hand side.

sep_list = ['→E', '¬E', '↓I']

string1 = "peter →E tom ¬E luis ↓I ed"
string2 = "sigrid →E jose l. ¬E jose t."

Applied on string1 the algorithm should return the string s1:

"↓I, ed"

and applied on string2 the algorithm should return the string s1:

"¬E, jose t."

What is a way to do that with python?



Solution 1:[1]

Another way to do so using regex:

import re

sep_list = ['?E', '¬E', '?I']

string1 = "peter ?E tom ¬E luis ?I ed"
string2 = "sigrid ?E jose l. ¬E jose t."

def separate_string(data, seps):
    pattern = "|".join(re.escape(sep) for sep in seps)
    start, end = [m.span() for m in re.finditer(pattern, data)][-1]

    return f"{data[start:end]},{data[end:]}"

print(separate_string(string1, sep_list))  # ?I, ed
print(separate_string(string2, sep_list))  # ¬E, jose t.

  • We create a regex pattern by separating each keyword with |.
  • For each match in the string, we use m.span() to retrieve the start and end of the match. We only keep the last match.
  • data[start:end] is the separator, while data[end:] is everything after.

Solution 2:[2]

Assuming the separators may exist in any order (or not at all), you could do this:

sep_list = ['?E', '¬E', '?I']

string1 = "peter ?E tom ¬E luis ?I ed"
string2 = "sigrid ?E jose l. ¬E jose t."

def process(s):
    indexes = []
    for sep in sep_list:
        if (index := s.find(sep)) >= 0:
            indexes.append((index, sep))
    if indexes:
        indexes.sort()
        t = indexes[-1]
        return f"{t[1]},{s[t[0]+len(t[1]):]}"

print(process(string1))
print(process(string2))

Output:

?I, ed
¬E, jose t.

Solution 3:[3]

Update: This solution does not need the re module! Update #2: Shorter solution.

def run(string):
    sep_lst = ['?E', '¬E', '?I']
    tokens = string.split()
    result = None
    for i,token in enumerate(tokens):
        if token in sep_lst:
            result = f'{tokens[i]}, {" ".join(tokens[i+1:])}'
    return result

print(run("peter ?E tom ¬E luis ?I ed"))
print(run("sigrid ?E jose l. ¬E jose t."))

Output:

?I, ed
¬E, jose t.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Cubix48
Solution 2 Albert Winestein
Solution 3