'Build Relational Database from JSON using Python
I have a JSON file that needs to be analyzed/visualized, and I believe the best way to do that is to turn it into a relational database. I am using Python3.
I can list the steps taken below:
- Download JSON from API
- Cleaned JSON -> list of dictionaries (some key/value pairs contain lists)
result_list[0] ##the length of this list is about 1 million
{'country_manufactured': 'USA',
'brand_name': 'xyz',
'patient_problems': ['Abscess',
'Hemorrhage/Bleeding',
'Unspecified Infection',
'Inflammation'],
'product_problems': ['Fracture',
'Failure to Osseointegrate',
'Loss of Osseointegration',
'Separation Failure',
'Malposition of Device',
'Device Damaged by Another Device',
'Material Deformation',
'Osseointegration Problem'],
'date_of_report': '20200101'}
As you can see, I have some keys that can simply be used as columns. But for the keys like patient_problems they are chosen from a fixed table of 300+ values (I have that elsewhere in a dataframe) and the same with product_problems.
- I flattened the JSON to
patient_problems0,patient_problems1... etc. but that is the wrong approach. How do I create a junction/associative table to solve my many-to-many problem?
result_df:
| ID | brand_name | country_name | patient_ID | device_ID |
|---|---|---|---|---|
| 0 | xyz | USA | 0 | 0 |
| 1 | abc | Brazil | 1 | 1 |
junction_df_patient: this is the table I need help building please!!!
| patient_ID | problem_ID |
|---|---|
| 0 | 5 |
| 0 | 12 |
| 1 | 12 |
patient_problems_df:
a table with all the possible problems
| problem_ID | patient_problem |
|---|---|
| 5 | Abscess |
| 12 | Hemorrhage/Bleeding |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
