'Extract data from nested JSON | Pandas
I'm dealing with a nested JSON in order to extract data about transactions from my database using pandas.
My JSON can have one of these contents :
{"Data":{"Parties":[{"ID":"JackyID","Role":12}],"NbIDs":1}} #One party identified
{"Data":{"Parties":[{"ID":"JackyID","Role":12},{"ID":"SamNumber","Role":10}],"NbIDs":2}} #Two Parties identified
{"Data":{"Parties":[],"NbIDs":0}} #No parties identified
{"Data": None} #No data
When looking to extract the values of ID (ID of the party - String datatype) and Role (Int datatype - refer to buyers when Role=12 and sellers when Role=10) and write it in a pandas dataframe, I'm using the following code :
for i,row in df.iterrows():
json_data = json.dumps(row['Data'])
data = pd_json.loads(json_data)
data_json = json.loads(data)
df['ID'] = pd.json_normalize(data_json, ['Data', 'Parties'])['ID']
df['Role'] = pd.json_normalize(data_json, ['Data', 'Parties'])['Role']
Now when trying to check its values and give every Role its correspending ID:
for i,row in df.iterrows():
if row['Role'] == 12:
df.at[i,'Buyer'] = df.at[i,'ID']
elif row['Role'] == 10:
df.at[i,'Seller'] = df.at[i,'ID']
df = df[['Buyer', 'Seller']]
The expected df result for the given scenario should be as below :
{"Data":{"Parties":[{"ID":"JackyID","Role":12}],"NbIDs":1}} #Transaction 1
{"Data":{"Parties":[{"ID":"JackyID","Role":12},{"ID":"SamNumber","Role":10}],"NbIDs":2}} #Transaction 2
{"Data":{"Parties":[],"NbIDs":0}} #Transaction 3
{"Data": None} #Transaction 4
>>print(df)
Buyer | Seller
------------------
JackyID| #Transaction 1 we have info about the buyer
JackyID| SamNumber #Transaction 2 we have infos about the buyer and the seller
| #Transaction 3 we don't have any infos about the parties
| #Transaction 4 we don't have any infos about the parties
What is the correct way to do so ?
Solution 1:[1]
You can special consider case 4 where there is no Data as empty Parties
df = pd.DataFrame(data['Data']['Parties'] if data['Data'] else [], columns=['ID', 'Role'])
df['Role'] = df['Role'].map({10: 'Seller', 12: 'Buyer'})
Then add possible missing values for Role
df = df.set_index('Role').reindex(['Seller', 'Buyer'], fill_value=pd.NA).T
print(df)
# Case 1
Role Seller Buyer
ID <NA> JackyID
# Case 2
Role Seller Buyer
ID SamNumber JackyID
# Case 3
Role Seller Buyer
ID <NA> <NA>
# Case 4
Role Seller Buyer
ID <NA> <NA>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ynjxsjmh |
