'Flatten out nested JSON using Python
I'm currently trying to find a way of creating an adapter for converting an undesirable explicit data model to one that is more generic and scalable. I thought I would try to use Python for this for at least the prototype. I have identified the patterns so, once I know how to select specific groups of data individually, I will then attempt to get them to produce generic objects within array which includes some metadata parameters to make it work nicely.
Where I am stuck is in reading the file properly, I am dealing with a deeply nested JSON with no arrays, 1,700 questions across an entire business. I have created a smaller file example which is representative of the bare minimum for a spike.
What I want to do is extract the values from coverages of both commercial_situations.property_asset.coverages and commercial_operations.liability_asset.coverages into two lists/dataframes and then put them into a generic object structure with metadata into an array in a new JSON schema.
{
"line_of_businesses": [
{
"asset_groups": {
"commercial_situations": [
{
"property_asset": {
"coverages": {
"contents": {
"basis_of_settlement": "REPLACEMENT",
"sum_insured": 10000
}
},
"excess_details": {
"excesses": [
{
"description": "Lorem Ipsum",
"imposed": 500,
"total": 500,
"type": "ACCIDENTAL_DAMAGE",
"unit": "CURRENCY",
"variable": 500,
"variable_origin": "BROKER"
}
]
}
}
}
]
},
"assets": {
"commercial_operations": [
{
"business_established_year": 2014,
"occupation_details": {
"consumer_occupation_code": "5243C",
"insurer_occupation_code": "5243C"
},
"liability_asset": {
"coverages": {
"public_and_products_liability": {
"limit_of_liability": 100000
}
},
"excess_details": {
"excesses": [
{
"description": "Lorem Ipsum",
"imposed": 0,
"total": 500,
"type": "PROPERTY_DAMAGE",
"unit": "CURRENCY",
"variable": 500,
"variable_origin": "BROKER"
}
]
},
"notes": {
"non_printable_notes": [
{
"description": "Description",
"note": "Lorem ipsum dolor sit amet, consectetur adipiscing",
"note_id": "512fd3a5-529a-4659-b483-6b0aba1a27a5"
}
],
"printable_notes": [
{
"description": "Description",
"note": "Lorem ipsum dolor sit amet, consectetur adipiscing",
"note_id": "512fd3a5-529a-4659-b483-6b0aba1a27a5"
}
]
}
},
"line_of_business_type": "COMMERCIAL_PACK"
}
]
},
"opportunity_id": "aeda01d8-f48e-4bb6-a842-ba6e4f85dfe2",
"thread_id": "c222cecf-af18-455a-9596-e76807d7c9be"
}
]
}
As far as I've got to make this usable is the following:
import json
import pandas as pd
from pandas.io.json import json_normalize
with open(r'D:\path\origin_test_input.json') as f:
d = json.load(f)
request = pd.json_normalize(d['line_of_businesses'])
request.head(3)
That gets me to the point where I am looking at two values and two arrays. The next step I need is to go into commercial_situations array and specifically go into property_asset object and then into coverages. From there, I want to get the values from contents and list them as columns.
So my issue is with deeply nested values. I have to repeat this process not only for contents, but every other building type insurance you can think of so, my script will produce a number of dataframes or lists which I will then aggregate together into a generic object structure within an array.
Any advice in this is greatly appreciated, I am not certain that Python will end up being the sticky plaster adapter I need of may very well build a prototype that proves it could easily be done. Thanks, Matt
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
