'json_normalize for nested ordereddict

i try to convert nested ordereddict using json_normalize. however for the nested ordereddict, it only manage to define header but put the values as NaN. below is the sample data for ordered dict

data = [OrderedDict([('name', 'HTTP'),
              ('or',
               OrderedDict([('name', '1'),
                            ('ip-port-tcp',
                             ['10001',
                              '9001'])]))]),
 OrderedDict([('name', 'L3045_1'),
              ('and',
               OrderedDict([('name', '1'),
                            ('or',
                             [OrderedDict([('name', '1'),
                                           ('ip-address',
                                            ['123.3.208.1',
                                             '122.255.199.2'])]),
                              OrderedDict([('name', '2'),
                                           ('ip-port-tcp',
                                            ['443', '80'])])])]))]),
 OrderedDict([('name', 'L3045_2'),
              ('and',
               OrderedDict([('name', '1'),
                            ('ip-port-tcp', '443'),
                            ('or',
                             OrderedDict([('name', '1'),
                                          ('ip-address',
                                           ['121.98.27.10',
                                            '10.241.58.30'])]))]))])
]

i get the output as below output of json normalize

as we can see in the above, the json_normalize can define the header in nested ordereddict as and.or.xxx however, the value is all store in and.or, where the expected output it should align between the header and the value.



Solution 1:[1]

I don't know how complex your data can be and I'm not sure a dataframe is the best way to represent what seems to be a logical combination of firewall rules. Anyway, for this specific column, you can first explode it to get new rows, then json_normalize the exploded column and finally update and join your original df with the renamed columns:

df = pd.json_normalize(data)

df = df.explode('and.or').reset_index(drop=True)
df_and_or = pd.json_normalize(df['and.or'])
df_and_or.columns = 'and.or.' + df_and_or.columns

df.update(df_and_or)
df = df.join(df_and_or[df_and_or.columns.difference(df.columns)])
print(df.drop('and.or', axis=1))

Output:

      name or.name or.ip-port-tcp and.name and.ip-port-tcp and.or.name             and.or.ip-address and.or.ip-port-tcp
0     HTTP       1  [10001, 9001]      NaN             NaN         NaN                           NaN                NaN
1  L3045_1     NaN            NaN        1             NaN           1  [123.3.208.1, 122.255.199.2]                NaN
2  L3045_1     NaN            NaN        1             NaN           2                           NaN          [443, 80]
3  L3045_2     NaN            NaN        1             443           1  [121.98.27.10, 10.241.58.30]                NaN

Edit: adding new columns to final result

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1