'Pandas : Create new column based on text value of another column
This might be very simple question, but here's my dataframe:
id text position labels
0 39088 skin melanoma [58.0, 71.0] indication
1 39088 proteinase [137.0, 147.0] protein
2 39088 plasminogen activator [170.0, 191.0] protein
3 39088 NaN [nan, nan] NaN
4 39088 NaN [nan, nan] NaN
5 39088 proteinase substrates [36.0, 57.0] protein
6 39088 tumors [67.0, 73.0] indication
7 39088 NaN [nan, nan] NaN
8 39088 Melanoma [0.0, 8.0] indication
9 39088 EDTA [172.0, 176.0] protein
{'pmid': [39088,
39088,
39088,
39088,
39088,
39088,
39088,
39088,
39088,
39088],
'text': ['skin melanoma',
'proteinase',
'plasminogen activator',
nan,
nan,
'proteinase substrates',
'tumors',
nan,
'Melanoma',
'EDTA'],
'position': ['[58.0, 71.0]',
'[137.0, 147.0]',
'[170.0, 191.0]',
'[nan, nan]',
'[nan, nan]',
'[36.0, 57.0]',
'[67.0, 73.0]',
'[nan, nan]',
'[0.0, 8.0]',
'[172.0, 176.0]'],
'labels': ['indication',
'protein',
'protein',
nan,
nan,
'protein',
'indication',
nan,
'indication',
'protein']}
And here's the WANTED OUTPUT , where I want to create 2 new columns based on the values of the labels column, and put as values the corresponding text and position depending on whether they are indication or protein, and for the rest NaN.
id indication indication.position protein protein.position
0 39088 skin melanoma [58.0, 71.0] NaN [nan, nan]
1 39088 NaN [nan, nan] proteinase [137.0, 147.0]
2 39088 NaN [nan, nan] plasminogen activator [170.0, 191.0]
3 39088 NaN [nan, nan] NaN [nan, nan]
4 39088 NaN [nan, nan] NaN [nan, nan]
5 39088 NaN [nan, nan] proteinase substrates [36.0, 57.0]
6 39088 tumors [67.0, 73.0] NaN [nan, nan]
7 39088 NaN [nan, nan] NaN [nan, nan]
8 39088 Melanoma [0.0, 8.0] NaN [nan, nan]
9 39088 NaN [nan, nan] EDTA [172.0, 176.0]
What is the best way to do this? Can someone help?
Solution 1:[1]
You can use:
out = (df
.drop(columns=['position', 'result.value.labels'])
.join(
df.reset_index().dropna(subset=['result.value.labels'])
.pivot(index='index', columns='result.value.labels', values='position')
.reindex(df.index)
.fillna('[nan, nan]')
.add_suffix('.position')
)
)
output:
pmid result.value.text indication.position protein.position
0 39088 skin melanoma [58.0, 71.0] [nan, nan]
1 39088 proteinase [nan, nan] [137.0, 147.0]
2 39088 plasminogen activator [nan, nan] [170.0, 191.0]
3 39088 NaN [nan, nan] [nan, nan]
4 39088 NaN [nan, nan] [nan, nan]
5 39088 proteinase substrates [nan, nan] [36.0, 57.0]
6 39088 tumors [67.0, 73.0] [nan, nan]
7 39088 NaN [nan, nan] [nan, nan]
8 39088 Melanoma [0.0, 8.0] [nan, nan]
9 39088 EDTA [nan, nan] [172.0, 176.0]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
