'I want to replace letters/words, but I am facing challenges in one aspect of my code

I will be using lloll as an example word.

Here's my code:

mapping = {'ll':'o','o':'ll'}
string = 'lloll'
out = ' '.join(mapping.get(s,s) for s in string.split())
print(out)

The output should be ollo, but I get lloll. When I write ll o ll, it works, but I don't want spaces in between the ll and the o, and I don't want to do something like mapping = {'lloll':'ollo'}.



Solution 1:[1]

Not sure if this accounts for all edge cases (what about overlapping matches?), but my current idea is to re.split the string by the mapping's keys and then apply the mapping.

import re
mapping = {'ll':'o','o':'ll'}
string = 'lloll'
choices = f'({"|".join(mapping)})'
result = ''.join(mapping.get(s, s) for s in re.split(choices, string))

Solution 2:[2]

Using Template.substitute

I would find each occurrence of the sub-strings to be replaced and pre-fix them with a $. Then use substitute(mapping) from string.Template to effectively do a global find and replace on them all.

Most importantly in this approach, you can control the way that potentially overlapping mappings are handled by using sorted() or reversed() with a key to sort the order in which they are applied.

findall then split_string

This way you also get a few nice extra generator functions to findall occurrences of a substring and to split_string into segments at given indices which may help with whatever larger task you are performing. If they aren't valuable there is shorter version at the bottom.

Given that it uses generators throughout it should be pretty fast and memory efficient.

from itertools import chain, repeat
from string import Template


def CharReplace(string: str, map):
    if ("$" in map) or ("$" in string):
        raise ValueError("Cannot handle anything with a $ sign")
    
    for old in map: #"old" is the key in the mapping
        locations = findall(string, old) #index of each occurance of "old"
        bits = split_string(string, locations) #the string split into segments, each starting with "old"
        template = zip(bits, repeat("$")) #tuples: (segment of the string, "$")
        string = ''.join(chain(*template))[:-1] #use chain(*) to unpack template so we get a new string with a $ in front of each occurance of "old" and strip the last "$" from the end

    template = Template(string)
    string = template.substitute(map) #substitute replaces substrings which follow "$" based on a mapping

    return string


def findall(string: str, sub: str):
    i = string.find(sub)
    while i != -1:
        yield i
        i = string.find(sub, i + len(sub))

def split_string(string: str, indices):
    for i in indices:
        yield string[:i]
        string = string[i:]
    yield string

This approach will not handle any strings with "$" in them without some extra code to escape them.

It will run through the string from front to back, one key at a time in whatever order the dict iterates them. You could add some form of sorted() on the keys in the line for old in map in order to handle keys in a specific order (eg longest first, alphabetically).

It will handle repeated occurrences of a key such that llll will be recognised as ll ll and lll as ll l

In your original case this turn the original string first into $llo$ll then into $ll$o$ll before using substitute to get ollo

Single Generator to add delimiters

If you prefer your code to be shorter:

def CharReplace2(string: str, map):
    if ("$" in map) or ("$" in string):
        raise ValueError("Cannot handle anything with a $ sign")
    
    for old in map: #"old" is the key in the mapping
        string = ''.join(add_delimiters(string, old)) #add a $ before each occurrence of old
    template = Template(string)
    string = template.substitute(map) #substitute replaces substrings which follow "$" based on a mapping

    return string

def add_delimiters(string: str, sub: str):
    i = string.find(sub)
    while i != -1:
        yield string[:i] #string up to next occurrence of sub
        string = ''.join(('$',string[i:])) #add a dollar to the start of the rest of the string (starts with sub)
        i = string.find(sub, i + len(sub)) #and find the next occurrence
    yield(string) #don't forget to yield the last bit of the string

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 timgeb
Solution 2