'how to select a particular string from a column value and assign it to a new column in pandas
docs = [doc1, doc2, doc 3]
nurses = [nur1, nur2, nur3]
| 1st responder | Associates |
|---|---|
| doc1 | nur1, nur2 |
| nur1 | doc1, nur2 |
| nur3 | nur1, nur2, doc2 |
| doc2 | nur2, nur1 |
I want to add a new column 'Doctor' and select the doctor from either '1st responder' or from 'Associates' such that the resultant df is like:
| 1st responder | Associates | Doctor |
|---|---|---|
| doc1 | nur1, nur2 | doc1 |
| nur1 | doc1, nur2 | doc1 |
| nur3 | nur1, nur2, doc2 | doc2 |
| doc2 | nur2, nur1 | doc2 |
Should I use apply/map or is there a vectorised method?
Solution 1:[1]
Keep the rows where 1st responder isin the list docs and fill the missing values with the str.extract on the second column. so
df['Doctor'] = (
df['1st responder']
.where(lambda x: x.isin(docs),
other = df['Associates'].str.extract(pat='('+'|'.join(docs)+')')[0])
)
print(df)
# 1st responder Associates Doctor
# 0 doc1 nur1, nur2 doc1
# 1 nur1 doc1, nur2 doc1
# 2 nur2 nur3, nur4, doc2 doc2
# 3 doc2 nur4 doc2
input used
df = pd.DataFrame({
'1st responder': ['doc1','nur1','nur2','doc2'],
'Associates' : ['nur1, nur2', 'doc1, nur2', 'nur3, nur4, doc2', 'nur4']
})
docs = ['doc1','doc2', 'doc3']
Solution 2:[2]
You can use:
docs = ['doc1', 'doc2', 'doc3']
df['Doctor'] = df.assign(Associates=df['Associates'].str.split(', ')) \
.melt(ignore_index=False).explode('value') \
.query('value.isin(@docs)').groupby(level=0)['value'].first()
Output:
| 1st responder | Associates | Doctor |
|---|---|---|
| doc1 | nur1, nur2 | doc1 |
| nur1 | doc1, nur2 | doc1 |
| nur3 | nur1, nur2, doc2 | doc2 |
| doc2 | nur2, nur1 | doc2 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ben.T |
| Solution 2 | Corralien |
