'Flatten nested JSON with pandas.json.normalize()

I'm requesting the Shopware 6 API for daily order data including some customer data (name, email) as well as the trackingID(s) and the ordered products that is the productID(s).

I get a response in the form of a nested JSON, for instance something like this:

[{'orderNumber': '9500285',
  'orderCustomer': {'email': '[email protected]',
   'firstName': 'John',
   'apiAlias': 'order_customer'},
  'deliveries': [{'trackingCodes': [12423238472347], 'apiAlias': 'order_delivery'}],
  'lineItems': [{'productId': '0073aa1ebfc684c296f68ea187b09438',
    'apiAlias': 'order_line_item'},
   {'productId': '46ae62ff2398586a235239d1ae675ab4',
    'apiAlias': 'order_line_item'}],
  'apiAlias': 'order'}]

or could be also a list of several orders, like so:

    [{'orderNumber': '9500285',
  'orderCustomer': {'email': '[email protected]',
   'firstName': 'John',
   'apiAlias': 'order_customer'},
  'deliveries': [{'trackingCodes': [], 'apiAlias': 'order_delivery'}],
  'lineItems': [{'productId': '0073aa1ebfc684c296f68ea187b09438',
    'apiAlias': 'order_line_item'},
   {'productId': '46ae62ff2398586a235239d1ae675ab4',
    'apiAlias': 'order_line_item'}],
  'apiAlias': 'order'},
 {'orderNumber': '9500273',
  'orderCustomer': {'email': '[email protected]',
   'firstName': 'Brad',
   'apiAlias': 'order_customer'},
  'deliveries': [{'trackingCodes': ['345345', '44978'],
    'apiAlias': 'order_delivery'}],
  'lineItems': [{'productId': None, 'apiAlias': 'order_line_item'}],
  'apiAlias': 'order'}]

What I ideally like to have is a pandas dataframe that looks like this:

orderNumber firstName email trackingCodes productIds
9500285 John [email protected] 0073aa1ebfc684c296f68ea187b09438, 46ae62ff2398586a235239d1ae675ab4
9500273 Brad [email protected] 345345, 44978

The trackingCodes can have several or none entries, as well as the productIds

I was able to get so far with the pandas json_normalize function:

df_track = pd.json_normalize(order_data, ["deliveries"], ["orderNumber", ["orderCustomer", "email"], ["orderCustomer", "firstName"]], errors='ignore')

which gives me

trackingCodes apiAlias orderNumber orderCustomer.email orderCustomer.firstName
[12423238472347] order_delivery 9500285 [email protected] John

When I'm trying to add lineItems for a json response of 1 order, like so:

df_track = pd.json_normalize(order_data, ["deliveries"], ["orderNumber", "lineItems", ["orderCustomer", "email"], ["orderCustomer", "firstName"]], errors='ignore')

I get the following error:

ValueError: Length of values (2) does not match length of index (1)

Any idea how to make the json_normalize function of pandas more flexible and independent of entries of the json response?



Solution 1:[1]

You can use pd.concat to concatenate two dataframe

df_track = pd.json_normalize(order_data, ["deliveries"], ["orderNumber", ["orderCustomer", "email"], ["orderCustomer", "firstName"]], errors='ignore')

df_track2 = pd.json_normalize(order_data, record_path=['lineItems'])

df_new = pd.concat([df_track,df_track2], axis=0)

df_new.reset_index(drop=True, inplace=True)

print(df_new)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Gaston Alex