'Is there a way to combine multiple resub operations into one to make it faster in Python?

I have a dataframe column that has an input like below.

Input = '{1:A06YCASDB2LXXXXX000000}{2:A303TYDBTM2AXXD}{3:{108:23158}}{4:\r\n:20:APS0182405\r\n:23B:DRED\r\n:32A:182349USD3280,00\r\n:33B:USD31280,00\r\n:52M:/73240222\r\nRAWR UK Ltd\r\n28 School Road\r\nfast\r\nCo. Angrid\r\n:57A:TETRIS\r\n:59:/BU500023231012000066241\r\nDUMMYNAME DUMMYLASTNAME\r\PLACE/REST\r\n:70:PA74536/39\r\n:71A:OUR\r\n-}

I have developed a chain regex method to apply multiple re.sub operations

    def chainRegex(string):                  
        string = re.sub(":\\d{2}[A-Z]?:"," ", string)
        string = re.sub("\r\n"," ", string)        
        string = [re.sub("([^a-zA-Z ]+?)","",i) for i in string.split()]
        string = list(filter(None, string))
        return string

The expected output is given a list below.

output = ['AYCASDBLXXXXXATYDBTMAXXD', 'APS', 'DRED', 'USD', 'USD', 'RAWR', 'UK', 'Ltd', 'School','Road', 'fast', 'Co', 'Angrid', 'TETRIS', 'BU', 'DUMMYNAME', 'DUMMYLASTNAME', 'PLACEREST', 'PA', 'OUR']

Is there a way to combine these multiple resub operations into one to make it faster or is there an alternative faster operation? Parsing option won't work because the structure of string sometimes corrupted (missing {} or keys).

python regex

Solution 1:^[1]

You can use

def chainRegex(string):                  
    x = re.sub(r"(?::\d{2}[A-Z]?:|\r\n)+", " ", string).split()
    return [w for w in ["".join(c for c in i if c.isalpha()) for i in x] if w != ""]

See the Python demo.

Here,

re.sub(r"(?::\d{2}[A-Z]?:|\r\n)+", " ", string).split() finds all one or more sequences of a colon + two digits, an optional letter and a colon or a CRLF line endings and replaces them with a single space
["".join(c for c in i if c.isalpha()) for i in x] - removes all non-letters from each word
[w for w in ... if w != ""] omits the empty items.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Wiktor StribiÅ¼ew

'Is there a way to combine multiple resub operations into one to make it faster in Python?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]