'Function to create new column with substring based on regex condition of another string column - python
I have a string column and would like to create a function to extract parts of the string based on some conditions of the original string column
EMAIL NUM_ID
[email protected] a9927345A
[email protected] B2722144X
[email protected] A2822876H
[email protected] b6722111A
[email protected] X8923314X
Would like to create NEW_NUM_ID based on the first letter of NUM_ID. If NUM_ID has "A" or "a" for the first alphabet, then the NEW_NUM_ID would pick the last 5 characters If NUM_ID has "B" or "b" for the first alphabet, then the NEW_NUM_ID would be the original string If NUM_ID has "X or "x" for the first alphabet, then the NEW_NUM_ID would be the original string
EMAIL NUM_ID NEW_NUM_ID
[email protected] a9927345A 7345A
[email protected] B2722144X B2722144X
[email protected] A2822876H 2876H
[email protected] b6722111A B6722111A
[email protected] X8923314X X8923314X
I have created the following code but can't seem to get it.
#Function to create the NEW_NUM_ID
def create_new_id(number_id):
match = re.findall(r'^[a-zA-Z].*', number_id)
if match[0] == 'A':
return number_id[-5:]
elif match[0] == 'B':
return number_id
elif match[0] == 'X':
return number_id
else:
return 'NA'
df['NEW_NUM_ID'] = df['NUM_ID'].apply(create_new_id)
Appreciate any form of help I can get, thank you.
Solution 1:[1]
Using np.where we can try:
df["NEW_NUM_ID"] = np.where(df["NUM_ID"].str.contains(r'^[bBxX]'),
df["NUM_ID"], # original
df["NUM_ID"].str.extract(r'(.{5})$')) # last 5
Solution 2:[2]
I tried this and it works too
def create_iden_no(id_number):
match = re.search(r'^(\w){1}', id_number)
if match.group(1) == 'A':
return id_number[-5:]
elif match.group(1) == 'B':
return id_number
elif match.group(1) == 'X':
return id_number
else:
return 'NA'
df['NEW_NUM_ID'] = df['NUM_ID'].apply(create_iden_no)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tim Biegeleisen |
| Solution 2 | wjie08 |
