'Flatten nested JSON with pandas.json.normalize()
I'm requesting the Shopware 6 API for daily order data including some customer data (name, email) as well as the trackingID(s) and the ordered products that is the productID(s).
I get a response in the form of a nested JSON, for instance something like this:
[{'orderNumber': '9500285',
'orderCustomer': {'email': '[email protected]',
'firstName': 'John',
'apiAlias': 'order_customer'},
'deliveries': [{'trackingCodes': [12423238472347], 'apiAlias': 'order_delivery'}],
'lineItems': [{'productId': '0073aa1ebfc684c296f68ea187b09438',
'apiAlias': 'order_line_item'},
{'productId': '46ae62ff2398586a235239d1ae675ab4',
'apiAlias': 'order_line_item'}],
'apiAlias': 'order'}]
or could be also a list of several orders, like so:
[{'orderNumber': '9500285',
'orderCustomer': {'email': '[email protected]',
'firstName': 'John',
'apiAlias': 'order_customer'},
'deliveries': [{'trackingCodes': [], 'apiAlias': 'order_delivery'}],
'lineItems': [{'productId': '0073aa1ebfc684c296f68ea187b09438',
'apiAlias': 'order_line_item'},
{'productId': '46ae62ff2398586a235239d1ae675ab4',
'apiAlias': 'order_line_item'}],
'apiAlias': 'order'},
{'orderNumber': '9500273',
'orderCustomer': {'email': '[email protected]',
'firstName': 'Brad',
'apiAlias': 'order_customer'},
'deliveries': [{'trackingCodes': ['345345', '44978'],
'apiAlias': 'order_delivery'}],
'lineItems': [{'productId': None, 'apiAlias': 'order_line_item'}],
'apiAlias': 'order'}]
What I ideally like to have is a pandas dataframe that looks like this:
| orderNumber | firstName | trackingCodes | productIds | |
|---|---|---|---|---|
| 9500285 | John | [email protected] | 0073aa1ebfc684c296f68ea187b09438, 46ae62ff2398586a235239d1ae675ab4 | |
| 9500273 | Brad | [email protected] | 345345, 44978 |
The trackingCodes can have several or none entries, as well as the productIds
I was able to get so far with the pandas json_normalize function:
df_track = pd.json_normalize(order_data, ["deliveries"], ["orderNumber", ["orderCustomer", "email"], ["orderCustomer", "firstName"]], errors='ignore')
which gives me
| trackingCodes | apiAlias | orderNumber | orderCustomer.email | orderCustomer.firstName |
|---|---|---|---|---|
| [12423238472347] | order_delivery | 9500285 | [email protected] | John |
When I'm trying to add lineItems for a json response of 1 order, like so:
df_track = pd.json_normalize(order_data, ["deliveries"], ["orderNumber", "lineItems", ["orderCustomer", "email"], ["orderCustomer", "firstName"]], errors='ignore')
I get the following error:
ValueError: Length of values (2) does not match length of index (1)
Any idea how to make the json_normalize function of pandas more flexible and independent of entries of the json response?
Solution 1:[1]
You can use pd.concat to concatenate two dataframe
df_track = pd.json_normalize(order_data, ["deliveries"], ["orderNumber", ["orderCustomer", "email"], ["orderCustomer", "firstName"]], errors='ignore')
df_track2 = pd.json_normalize(order_data, record_path=['lineItems'])
df_new = pd.concat([df_track,df_track2], axis=0)
df_new.reset_index(drop=True, inplace=True)
print(df_new)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Gaston Alex |
