'Build data frame from list with nested dictionaries
I need to build a dataframe from a list of nested dictionaries. The list is something like this:
[
{'user': {'id': '0011c9a5-d870-4a4d-b32c-689e73f11049'}, 'userContent': {'gy_cv': {}}}
{'user': {'id': '001e6168-8034-41a1-8d3b-afde984aa5e8'}, 'userContent': {'gy_cv': {}}}
{'user': {'id': '00248a0e-9bc8-47a9-9955-77363772d5cf'}, 'userContent': {'gy_cv': {'checker': {'intro': 'yes'}}, 'covidMessages': {'messages': {'callDoctorOrTelemed': True, 'callAheadER': False, 'goToER': False, 'quarantine': False, 'dontSpread': True, 'seriousWarning': True, 'lowRisk': False, 'watchForSymptoms': False}}}}
{'user': {'id': '002ac869-2745-440d-95d2-6641ccb12340'}, 'userContent': {'gy_cv': {'checker': {'intro': 'yes'}}, 'covidMessages': {'messages': {'callDoctorOrTelemed': True, 'callAheadER': False, 'goToER': False, 'quarantine': False, 'dontSpread': True, 'seriousWarning': False, 'lowRisk': False, 'watchForSymptoms': False}}}}
]
And I need to get this output:
Id Intro callDoctorOrTelemed callAheadER goToER quarantine dontSpread seriousWarning lowRisk watchForSymptoms
0011c9a5-d870-4a4d-b32c-689e73f11049
001e6168-8034-41a1-8d3b-afde984aa5e8
00248a0e-9bc8-47a9-9955-77363772d5cf YES TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
I've searched but none of the found solutions worked.
Thank you
Solution 1:[1]
You could use glom, to extract the data:
import pandas as pd
from json import loads
from glom import glom
data = loads('[{"user":{"id":"0011c9a5-d870-4a4d-b32c-689e73f11049"},"userContent":{"gy_cv":{}}},{"user":{"id":"001e6168-8034-41a1-8d3b-afde984aa5e8"},"userContent":{"gy_cv":{}}},{"user":{"id":"00248a0e-9bc8-47a9-9955-77363772d5cf"},"userContent":{"gy_cv":{"checker":{"intro":"yes"}},"covidMessages":{"messages":{"callDoctorOrTelemed":true,"callAheadER":false,"goToER":false,"quarantine":false,"dontSpread":true,"seriousWarning":true,"lowRisk":false,"watchForSymptoms":false}}}},{"user":{"id":"002ac869-2745-440d-95d2-6641ccb12340"},"userContent":{"gy_cv":{"checker":{"intro":"yes"}},"covidMessages":{"messages":{"callDoctorOrTelemed":true,"callAheadER":false,"goToER":false,"quarantine":false,"dontSpread":true,"seriousWarning":false,"lowRisk":false,"watchForSymptoms":false}}}}]')
extracted_data = {
'Id': [glom(d, 'user.id', default=None) for d in data],
'Intro': [glom(d, 'userContent.gy_cv.checker.intro', default='no') for d in data],
'callDoctorOrTelemed': [glom(d, 'userContent.covidMessages.messages.callDoctorOrTelemed', default=False) for d in data],
'callAheadER': [glom(d, 'userContent.covidMessages.messages.callAheadER', default=False) for d in data],
'goToER': [glom(d, 'userContent.covidMessages.messages.goToER', default=False) for d in data],
'quarantine': [glom(d, 'userContent.covidMessages.messages.quarantine', default=False) for d in data],
'dontSpread': [glom(d, 'userContent.covidMessages.messages.dontSpread', default=False) for d in data],
'seriousWarning': [glom(d, 'userContent.covidMessages.messages.seriousWarning', default=False) for d in data],
'lowRisk': [glom(d, 'userContent.covidMessages.messages.lowRisk', default=False) for d in data],
'watchForSymptoms': [glom(d, 'userContent.covidMessages.messages.watchForSymptoms', default=False) for d in data],
}
df = pd.DataFrame(data=extracted_data)
print(df)
Output:
Id Intro callDoctorOrTelemed callAheadER goToER quarantine dontSpread seriousWarning lowRisk watchForSymptoms
0 0011c9a5-d870-4a4d-b32c-689e73f11049 no False False False False False False False False
1 001e6168-8034-41a1-8d3b-afde984aa5e8 no False False False False False False False False
2 00248a0e-9bc8-47a9-9955-77363772d5cf yes True False False False True True False False
3 002ac869-2745-440d-95d2-6641ccb12340 yes True False False False True False False False
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
