'Python Multinested Dictonary To Pandas Dataframe Boolean Table
I am looking for any advice on how to cleanly convert a python multi level nested dictionary (from JSON) into a data frame boolean table.
Rules:
- Only True is recorded. If empty it is False.
- The list may be of N Length
- The groups may be of N length
- The bools may be of N types
Example Input:
{1:{'group_a':{'bool_a':True,
'bool_b':True,
'bool_n':True},
'group_n':{'bool_b':True,
'bool_n':True}
},
2:{'group_a':{'bool_a':True,
'bool_b':True,
'bool_n':True},
'group_n':{'bool_b':True,
'bool_n':True}
},
'n':{'group_a':{'bool_a':True,
'bool_c':True},
'group_n':{'bool_b':True}
},
}
Desired Output:
Ga_Ba, Ga_Bb, Ga_Bc, Ga_Bn, Gn_Ba, Gn_Bb, ... Gn_Bn....
1 True True False True False True True
2 True True False True False True True
n True False True False False False False
...
Ideas? Bonus points for speed and conciseness. I have a solution but I am looking for something more elegant than the for loop mess I have now. Alternative data structures may also be welcome.
Solution 1:[1]
Goofy method #1
s = pd.DataFrame.from_dict(data, orient='index').stack()
pd.json_normalize(s).set_index(s.index) \
.stack().unstack([1, 2], fill_value=False) \
.sort_index(axis=1)
group_a group_n
bool_a bool_b bool_c bool_n bool_b bool_n
1 True True False True True True
2 True True False True True True
3 True False True False True False
Straight from comprehension
pd.DataFrame.from_dict({
k0: {
f'G{k1.split("_")[1]}_B{k2.split("_")[1]}': val
for k1, d1 in d0.items()
for k2, val in d1.items()
}
for k0, d0 in data.items()
}, orient='index').fillna(False)
Ga_Ba Ga_Bb Ga_Bn Gn_Bb Gn_Bn Ga_Bc
1 True True True True True False
2 True True True True True False
3 True False False True False True
Solution 2:[2]
You could use a dictionary comprehension and concat:
import pandas as pd
values = {
"1": {
"group_a": {"bool_a": True, "bool_b": True, "bool_n": True},
"group_n": {"bool_b": True, "bool_n": True},
},
"2": {
"group_a": {"bool_a": True, "bool_b": True, "bool_n": True},
"group_n": {"bool_b": True, "bool_n": True},
},
"n": {"group_a": {"bool_a": True, "bool_c": True}, "group_n": {"bool_b": True}},
}
stacked_values = {k: pd.DataFrame(v).stack() for k, v in values.items()}
df = (
pd.concat(stacked_values, axis=1)
.T.fillna(False)
.swaplevel(axis=1) # optional
.sort_index(axis=1)
)
Output:
group_a group_n
bool_a bool_b bool_c bool_n bool_b bool_n
1 True True False True True True
2 True True False True True True
n True False True False True False
Solution 3:[3]
try using a dataframe pivot_table
dct={1:{'group_a':{'bool_a':True,
'bool_b':True,
'bool_n':True},
'group_n':{'bool_b':True,
'bool_n':True}
},
2:{'group_a':{'bool_a':True,
'bool_b':True,
'bool_n':True},
'group_n':{'bool_b':True,
'bool_n':True}
},
'n':{'group_a':{'bool_a':True,
'bool_c':True},
'group_n':{'bool_b':True}
},
}
df=pd.DataFrame.from_dict(dct, orient='index')
df=df.stack()
lst=[]
for k,vals in df.items():
for index,item in vals.items():
lst.append((k[0],k[1],index,item))
df=pd.DataFrame(lst,columns=['col1','col2','col3','col4'])
#print(df)
fp=df.pivot_table(index='col1',columns=['col2','col3'],values='col4', aggfunc=np.any).fillna(False)
print(fp)
output:
col2 group_a group_n
col3 bool_a bool_b bool_c bool_n bool_b bool_n
col1
1 True True False True True True
2 True True False True True True
n True False True False True False
Solution 4:[4]
Try:
df = pd.DataFrame(
[
{f"{k}_{kk}": vv for k, v in d.items() for kk, vv in v.items()}
for d in data.values()
],
index=data,
).fillna(False)
print(df)
Prints:
group_a_bool_a group_a_bool_b group_a_bool_n group_n_bool_b group_n_bool_n group_a_bool_c
1 True True True True True False
2 True True True True True False
n True False False True False True
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Bill DeRose |
| Solution 3 | Golden Lion |
| Solution 4 |
