'Function to create new column with substring based on regex condition of another string column - python

I have a string column and would like to create a function to extract parts of the string based on some conditions of the original string column

EMAIL               NUM_ID    
[email protected]       a9927345A
[email protected]     B2722144X 
[email protected]   A2822876H 
[email protected]    b6722111A
[email protected]      X8923314X

Would like to create NEW_NUM_ID based on the first letter of NUM_ID. If NUM_ID has "A" or "a" for the first alphabet, then the NEW_NUM_ID would pick the last 5 characters If NUM_ID has "B" or "b" for the first alphabet, then the NEW_NUM_ID would be the original string If NUM_ID has "X or "x" for the first alphabet, then the NEW_NUM_ID would be the original string

EMAIL               NUM_ID        NEW_NUM_ID
[email protected]       a9927345A     7345A 
[email protected]     B2722144X     B2722144X
[email protected]   A2822876H     2876H
[email protected]    b6722111A     B6722111A
[email protected]      X8923314X     X8923314X

I have created the following code but can't seem to get it.

#Function to create the NEW_NUM_ID

def create_new_id(number_id):
    match = re.findall(r'^[a-zA-Z].*', number_id)
    if match[0] == 'A':
        return number_id[-5:]
    elif match[0] == 'B':
        return number_id
    elif match[0] == 'X':
        return number_id
    else:
        return 'NA'

df['NEW_NUM_ID'] = df['NUM_ID'].apply(create_new_id)

Appreciate any form of help I can get, thank you.



Solution 1:[1]

Using np.where we can try:

df["NEW_NUM_ID"] = np.where(df["NUM_ID"].str.contains(r'^[bBxX]'),
                            df["NUM_ID"],                           # original
                            df["NUM_ID"].str.extract(r'(.{5})$'))   # last 5

Solution 2:[2]

I tried this and it works too

def create_iden_no(id_number):
    match = re.search(r'^(\w){1}', id_number)
    if match.group(1) == 'A':
        return id_number[-5:]
    elif match.group(1) == 'B':
        return id_number
    elif match.group(1) == 'X':
        return id_number
    else:
        return 'NA'

df['NEW_NUM_ID'] = df['NUM_ID'].apply(create_iden_no)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tim Biegeleisen
Solution 2 wjie08