'Flatten out nested JSON using Python

I'm currently trying to find a way of creating an adapter for converting an undesirable explicit data model to one that is more generic and scalable. I thought I would try to use Python for this for at least the prototype. I have identified the patterns so, once I know how to select specific groups of data individually, I will then attempt to get them to produce generic objects within array which includes some metadata parameters to make it work nicely.

Where I am stuck is in reading the file properly, I am dealing with a deeply nested JSON with no arrays, 1,700 questions across an entire business. I have created a smaller file example which is representative of the bare minimum for a spike.

What I want to do is extract the values from coverages of both commercial_situations.property_asset.coverages and commercial_operations.liability_asset.coverages into two lists/dataframes and then put them into a generic object structure with metadata into an array in a new JSON schema.

{
"line_of_businesses": [
  {
    "asset_groups": {
      "commercial_situations": [
        {
          "property_asset": {
            "coverages": {
              "contents": {
                "basis_of_settlement": "REPLACEMENT",
                "sum_insured": 10000
              }
            },
            "excess_details": {
              "excesses": [
                {
                  "description": "Lorem Ipsum",
                  "imposed": 500,
                  "total": 500,
                  "type": "ACCIDENTAL_DAMAGE",
                  "unit": "CURRENCY",
                  "variable": 500,
                  "variable_origin": "BROKER"
                }
              ]
            }
        }
    }
]
},
    "assets": {
      "commercial_operations": [
        {
          "business_established_year": 2014,
          "occupation_details": {
            "consumer_occupation_code": "5243C",
            "insurer_occupation_code": "5243C"
        },
          "liability_asset": {
            "coverages": {
              "public_and_products_liability": {
                "limit_of_liability": 100000
              }
            },
            "excess_details": {
              "excesses": [
                {
                  "description": "Lorem Ipsum",
                  "imposed": 0,
                  "total": 500,
                  "type": "PROPERTY_DAMAGE",
                  "unit": "CURRENCY",
                  "variable": 500,
                  "variable_origin": "BROKER"
                }
              ]
            },
            "notes": {
              "non_printable_notes": [
                {
                  "description": "Description",
                  "note": "Lorem ipsum dolor sit amet, consectetur adipiscing",
                  "note_id": "512fd3a5-529a-4659-b483-6b0aba1a27a5"
                }
              ],
              "printable_notes": [
                {
                  "description": "Description",
                  "note": "Lorem ipsum dolor sit amet, consectetur adipiscing",
                  "note_id": "512fd3a5-529a-4659-b483-6b0aba1a27a5"
                }
              ]
            }
        },
    "line_of_business_type": "COMMERCIAL_PACK"
  }
]
},
"opportunity_id": "aeda01d8-f48e-4bb6-a842-ba6e4f85dfe2",
"thread_id": "c222cecf-af18-455a-9596-e76807d7c9be"
}
]
}

As far as I've got to make this usable is the following:

import json
import pandas as pd
from pandas.io.json import json_normalize

with open(r'D:\path\origin_test_input.json') as f:
    d = json.load(f)

request = pd.json_normalize(d['line_of_businesses'])
request.head(3)

That gets me to the point where I am looking at two values and two arrays. The next step I need is to go into commercial_situations array and specifically go into property_asset object and then into coverages. From there, I want to get the values from contents and list them as columns.

So my issue is with deeply nested values. I have to repeat this process not only for contents, but every other building type insurance you can think of so, my script will produce a number of dataframes or lists which I will then aggregate together into a generic object structure within an array.

Any advice in this is greatly appreciated, I am not certain that Python will end up being the sticky plaster adapter I need of may very well build a prototype that proves it could easily be done. Thanks, Matt



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source