'Flatten JSON array into long form
I am having trouble flattening a json array into the format that I need. It is a fairly complex json with nested parts for various sections - see a simplified version below.
{
"RESPONSE":{"@VersionID":"1.1","@ResponseID":"A0001"},
"SUMMARY": {
"@PersonID": "Person01",
"@_Name": "Attributes",
"_DATA_SET": [
{
"@_Name": "Number of accounts",
"@_Value": "27"
},
{
"@_Name": "Average age of open accounts",
"@_Value": "35"
},
{
"@_Name": "Number of closed accounts",
"@_Value": "4"
}
]
}
}
I have a dataset where one of the columns contains a json like the one above in each row. For each row, I want to parse the summary section (specifically _DATA_SET) into a long format so that I can eventually pivot each @Name into a different column.
Current data:
row_id | json_example
1 | json_example1
2 | json example2
Desired output:
row id | Number of accounts | Average age of open accounts | Number of closed accounts
1 | 27 | 35 | 4
2 | 27 | 35 | 4
I have tried the following code which will parse my json_example into various columns of which one of them is the summary column, but I can not figure out how to further parse that into various rows which I can then pivot into columns.
pd.json_normalize(df.json_example.apply(json.loads)) yields
RESPONSE.VersionID | RESPONSE.ResponseID | SUMMARY.@PersonID | SUMMARY.@_Name| SUMMARY._DATA_SET
[{'@_Name': 'Number of accounts'...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
