'pandas single column value to multiple column headers with formatted values
I am trying to convert a single column extra into three new headers based on the string value of extra formatted as <column name>: <column value(s)>, ..., <column name>: <column value(s)> where column name is the new column and column value(s) can be an arbitrary column value such as list, float or string.
I am working with the following dataframe:
import pandas as pd
df = pd.DataFrame(
{
"subject": [1,1],
"extra": ["category: app, datasets: [\"X\", \"Y\"], acc: [0.8, 0.9]",
"category: dev, datasets: [\"Z\", \"Y\"], acc: [0.7, 0.95]"],
}
)
desired output:
subject category datasets acc
0 1 app [X, Y] [0.8, 0.9]
1 1 dev [Z, Y] [0.7, 0.95]
and then df.explode(["acc", "datasets"]) will give the final desired result
subject category datasets acc
0 1 app X 0.8
0 1 app Y 0.9
1 1 dev Z 0.7
1 1 dev Y 0.95
Solution 1:[1]
You can use pyyaml:
import yaml
extracted_df = pd.json_normalize(df['extra'].apply(lambda x: yaml.load(re.sub(r',\s*(\w+:)', '\n\\1', x), Loader=yaml.SafeLoader)))
new_df = pd.concat([df.drop('extra', axis=1), extracted_df], axis=1)
Output:
>>> new_df
subject category datasets acc
0 1 app [X, Y] [0.8, 0.9]
1 1 dev [Z, Y] [0.7, 0.95]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
