'how to select a particular string from a column value and assign it to a new column in pandas

docs = [doc1, doc2, doc 3]

nurses = [nur1, nur2, nur3]

1st responder Associates
doc1 nur1, nur2
nur1 doc1, nur2
nur3 nur1, nur2, doc2
doc2 nur2, nur1

I want to add a new column 'Doctor' and select the doctor from either '1st responder' or from 'Associates' such that the resultant df is like:

1st responder Associates Doctor
doc1 nur1, nur2 doc1
nur1 doc1, nur2 doc1
nur3 nur1, nur2, doc2 doc2
doc2 nur2, nur1 doc2

Should I use apply/map or is there a vectorised method?



Solution 1:[1]

Keep the rows where 1st responder isin the list docs and fill the missing values with the str.extract on the second column. so

df['Doctor'] = (
    df['1st responder']
      .where(lambda x: x.isin(docs), 
             other = df['Associates'].str.extract(pat='('+'|'.join(docs)+')')[0])
)
print(df)
#   1st responder        Associates Doctor
# 0          doc1        nur1, nur2   doc1
# 1          nur1        doc1, nur2   doc1
# 2          nur2  nur3, nur4, doc2   doc2
# 3          doc2              nur4   doc2

input used

df = pd.DataFrame({
    '1st responder': ['doc1','nur1','nur2','doc2'],
    'Associates' : ['nur1, nur2', 'doc1, nur2', 'nur3, nur4, doc2', 'nur4']
})
docs = ['doc1','doc2', 'doc3']

Solution 2:[2]

You can use:

docs = ['doc1', 'doc2', 'doc3']

df['Doctor'] = df.assign(Associates=df['Associates'].str.split(', ')) \
                 .melt(ignore_index=False).explode('value') \
                 .query('value.isin(@docs)').groupby(level=0)['value'].first()

Output:

1st responder Associates Doctor
doc1 nur1, nur2 doc1
nur1 doc1, nur2 doc1
nur3 nur1, nur2, doc2 doc2
doc2 nur2, nur1 doc2

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ben.T
Solution 2 Corralien