'Extract data from nested JSON | Pandas

I'm dealing with a nested JSON in order to extract data about transactions from my database using pandas.

My JSON can have one of these contents :

{"Data":{"Parties":[{"ID":"JackyID","Role":12}],"NbIDs":1}} #One party identified
{"Data":{"Parties":[{"ID":"JackyID","Role":12},{"ID":"SamNumber","Role":10}],"NbIDs":2}} #Two Parties identified
{"Data":{"Parties":[],"NbIDs":0}} #No parties identified
{"Data": None} #No data

When looking to extract the values of ID (ID of the party - String datatype) and Role (Int datatype - refer to buyers when Role=12 and sellers when Role=10) and write it in a pandas dataframe, I'm using the following code :

for i,row in df.iterrows():
    json_data = json.dumps(row['Data'])
    data = pd_json.loads(json_data)
data_json = json.loads(data)
df['ID'] = pd.json_normalize(data_json, ['Data', 'Parties'])['ID']
df['Role'] = pd.json_normalize(data_json, ['Data', 'Parties'])['Role']

Now when trying to check its values and give every Role its correspending ID:

for i,row in df.iterrows():
    if row['Role'] == 12:
        df.at[i,'Buyer'] = df.at[i,'ID']
    elif row['Role'] == 10:
        df.at[i,'Seller'] = df.at[i,'ID']

df = df[['Buyer', 'Seller']]

The expected df result for the given scenario should be as below :

{"Data":{"Parties":[{"ID":"JackyID","Role":12}],"NbIDs":1}} #Transaction 1
{"Data":{"Parties":[{"ID":"JackyID","Role":12},{"ID":"SamNumber","Role":10}],"NbIDs":2}} #Transaction 2
{"Data":{"Parties":[],"NbIDs":0}} #Transaction 3
{"Data": None} #Transaction 4
>>print(df)
Buyer  | Seller
------------------
JackyID|              #Transaction 1 we have info about the buyer
JackyID| SamNumber    #Transaction 2 we have infos about the buyer and the seller
       |              #Transaction 3 we don't have any infos about the parties
       |              #Transaction 4 we don't have any infos about the parties

What is the correct way to do so ?



Solution 1:[1]

You can special consider case 4 where there is no Data as empty Parties

df = pd.DataFrame(data['Data']['Parties'] if data['Data'] else [], columns=['ID', 'Role'])
df['Role'] = df['Role'].map({10: 'Seller', 12: 'Buyer'})

Then add possible missing values for Role

df = df.set_index('Role').reindex(['Seller', 'Buyer'], fill_value=pd.NA).T
print(df)

# Case 1
Role Seller    Buyer
ID     <NA>  JackyID

# Case 2
Role     Seller    Buyer
ID    SamNumber  JackyID

# Case 3
Role Seller Buyer
ID     <NA>  <NA>

# Case 4
Role Seller Buyer
ID     <NA>  <NA>

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ynjxsjmh