'Python Multinested Dictonary To Pandas Dataframe Boolean Table

I am looking for any advice on how to cleanly convert a python multi level nested dictionary (from JSON) into a data frame boolean table.

Rules:

  • Only True is recorded. If empty it is False.
  • The list may be of N Length
  • The groups may be of N length
  • The bools may be of N types

Example Input:

{1:{'group_a':{'bool_a':True,
               'bool_b':True,
               'bool_n':True},
    'group_n':{'bool_b':True,
               'bool_n':True}
   },
 2:{'group_a':{'bool_a':True,
               'bool_b':True,
               'bool_n':True},
    'group_n':{'bool_b':True,
               'bool_n':True}
   },
'n':{'group_a':{'bool_a':True,
                'bool_c':True},
     'group_n':{'bool_b':True}
   },
}

Desired Output:

   Ga_Ba, Ga_Bb, Ga_Bc, Ga_Bn, Gn_Ba, Gn_Bb, ... Gn_Bn....
1   True   True  False  True   False   True       True
2   True   True  False  True   False   True       True
n   True   False True   False  False   False      False
...

Ideas? Bonus points for speed and conciseness. I have a solution but I am looking for something more elegant than the for loop mess I have now. Alternative data structures may also be welcome.



Solution 1:[1]

Goofy method #1

s = pd.DataFrame.from_dict(data, orient='index').stack()

pd.json_normalize(s).set_index(s.index) \
  .stack().unstack([1, 2], fill_value=False) \
  .sort_index(axis=1)

  group_a                      group_n       
   bool_a bool_b bool_c bool_n  bool_b bool_n
1    True   True  False   True    True   True
2    True   True  False   True    True   True
3    True  False   True  False    True  False

Straight from comprehension

pd.DataFrame.from_dict({
    k0: {
            f'G{k1.split("_")[1]}_B{k2.split("_")[1]}': val
            for k1, d1 in d0.items()
            for k2, val in d1.items()
        }
    for k0, d0 in data.items()
}, orient='index').fillna(False)

   Ga_Ba  Ga_Bb  Ga_Bn  Gn_Bb  Gn_Bn  Ga_Bc
1   True   True   True   True   True  False
2   True   True   True   True   True  False
3   True  False  False   True  False   True

Solution 2:[2]

You could use a dictionary comprehension and concat:

import pandas as pd

values = {
    "1": {
        "group_a": {"bool_a": True, "bool_b": True, "bool_n": True},
        "group_n": {"bool_b": True, "bool_n": True},
    },
    "2": {
        "group_a": {"bool_a": True, "bool_b": True, "bool_n": True},
        "group_n": {"bool_b": True, "bool_n": True},
    },
    "n": {"group_a": {"bool_a": True, "bool_c": True}, "group_n": {"bool_b": True}},
}
stacked_values = {k: pd.DataFrame(v).stack() for k, v in values.items()}
df = (
    pd.concat(stacked_values, axis=1)
    .T.fillna(False)
    .swaplevel(axis=1)  # optional
    .sort_index(axis=1)
)

Output:

  group_a                      group_n       
   bool_a bool_b bool_c bool_n  bool_b bool_n
1    True   True  False   True    True   True
2    True   True  False   True    True   True
n    True  False   True  False    True  False

Solution 3:[3]

try using a dataframe pivot_table

dct={1:{'group_a':{'bool_a':True,
               'bool_b':True,
               'bool_n':True},
    'group_n':{'bool_b':True,
               'bool_n':True}
   },
 2:{'group_a':{'bool_a':True,
               'bool_b':True,
               'bool_n':True},
    'group_n':{'bool_b':True,
               'bool_n':True}
   },
'n':{'group_a':{'bool_a':True,
                'bool_c':True},
     'group_n':{'bool_b':True}
   },
}
df=pd.DataFrame.from_dict(dct, orient='index')
df=df.stack()
lst=[]
for k,vals in df.items():
    for index,item in vals.items():
        lst.append((k[0],k[1],index,item))
        
df=pd.DataFrame(lst,columns=['col1','col2','col3','col4'])
#print(df)
fp=df.pivot_table(index='col1',columns=['col2','col3'],values='col4', aggfunc=np.any).fillna(False)
print(fp)

output:

col2 group_a                      group_n       
col3  bool_a bool_b bool_c bool_n  bool_b bool_n
col1                                            
1       True   True  False   True    True   True
2       True   True  False   True    True   True
n       True  False   True  False    True  False

Solution 4:[4]

Try:

df = pd.DataFrame(
    [
        {f"{k}_{kk}": vv for k, v in d.items() for kk, vv in v.items()}
        for d in data.values()
    ],
    index=data,
).fillna(False)

print(df)

Prints:

   group_a_bool_a  group_a_bool_b  group_a_bool_n  group_n_bool_b  group_n_bool_n  group_a_bool_c
1            True            True            True            True            True           False
2            True            True            True            True            True           False
n            True           False           False            True           False            True

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Bill DeRose
Solution 3 Golden Lion
Solution 4