'parse string into columns
I have a column called "name". It is of type string. It looks something like this :
| name |
|---|
| attr1=val1a; attr2=val2a; attr3=val3a; attr4=val4a |
| attr1=val1b; attr5=val5b; attr3=val3b; attr6=val6b |
I want to resolve this column into :
| attr1 | attr2 | attr3 | attr4 | attr5 | attr6 |
|---|---|---|---|---|---|
| val1a | val2a | val3a | val4a | Null | Null |
| val1b | Null | val3b | Null | val5b | val6b |
Can someone help me find a way to do this in python.
Thanks in advance!
Solution 1:[1]
You can split each string by ;, and for each split, split by = and convert it into a dict and then use pd.json_normalize:
new_df = pd.json_normalize(df['name'].str.split(r';\s*').apply(lambda lst: dict(i.split('=') for i in lst)))
Output:
>>> new_df
attr1 attr2 attr3 attr4 attr5 attr6
0 val1a val2a val3a val4a NaN NaN
1 val1b NaN val3b NaN val5b val6b
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | richardec |
