'Pandas flattening nested jsons
so this is probably going to be a duplicate question but i'll make a try since I have not found anything.
I am trying to flatten a json with pandas, normal work. Looking at the examples of the docs here is the closest example for what I am trying to do:
data = [{'state': 'Florida',
'shortname': 'FL',
'info': {'governor': 'Rick Scott'},
'counties': [{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}]},
{'state': 'Ohio',
'shortname': 'OH',
'info': {'governor': 'John Kasich'},
'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]}]
result = pd.json_normalize(data, 'counties', ['state', 'shortname',
['info', 'governor']])
result
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich
However, this example show us a way to get the data inside counties flatten with alongside the column state and shortname.
Let's say that I have n number of columns at the root of each json object ( n number of state or shortname columns in the example above ). How do I include them all, in order to flat the counties but keep everything else that is adjacent?
First I tried things like these:
#None to treat data as a list of records
#Result of counties is still nested, not working
result = pd.json_normalize(data, None, ['counties'])
or
result = pd.json_normalize(data, None, ['counties', 'name'])
Then I Thought of getting the columns with dataframe.columns and reuse it since meta argument of json_normalize can take array of string.
But i'm stuck. and columns appear to return nested json attribute which I don't want to.
#still nested
cols = pd.json_normalize(data).columns.to_list()
#Exclude it because we already have it
cols = [index for index in cols if index != 'counties']
#remove nested columns if any
cols = [index for index in cols if "." not in index]
result = pd.json_normalize(data, 'counties', cols, errors="ignore")
#still nested
name population state shortname ... other6 other7 counties info.governor
0 Dade 12345 Florida FL ... dumb_data dumb_data [{'name': 'Dade', 'population': 12345}, {'name... NaN
1 Broward 40000 Florida FL ... dumb_data dumb_data [{'name': 'Dade', 'population': 12345}, {'name... NaN
2 Palm Beach 60000 Florida FL ... dumb_data dumb_data [{'name': 'Dade', 'population': 12345}, {'name... NaN
3 Summit 1234 Ohio OH ... dumb_data dumb_data [{'name': 'Summit', 'population': 1234}, {'nam... NaN
4 Cuyahoga 1337 Ohio OH ... dumb_data dumb_data [{'name': 'Summit', 'population': 1234}, {'nam... NaN
I would prefere not to just harcode the column names since they change and that for this case I have 64 of them...
For better understanding, this is the real kind of data i'm working on from Woo Rest API. I am not using it here because its really long, but basically I am trying to flat line_items keeping only product_id inside it and of course all the other columns which is adjacent to line_items.
Solution 1:[1]
Okay so guys if you want to flatten a json and keeping everything else, you should used pd.Dataframe.explode()
Here is my logic:
import pandas as pd
data = [
{'state': 'Florida',
'shortname': 'FL',
'info': {'governor': 'Rick Scott'},
'counties': [
{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}
]
},
{'state': 'Ohio',
'shortname': 'OH',
'info': {'governor': 'John Kasich'},
'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]}
]
#No Formating only converting to a Df
result = pd.json_normalize(data)
#Exploding the wanted nested column
exploded = result.explode('counties')
#Keeping the name only - this can be custom
exploded['countie_name'] = exploded['counties'].apply(lambda x: x['name'])
#Drop the used column since we took what interested us inside it.
exploded = exploded.drop(['counties'], axis=1)
print(exploded)
#Duplicate for Florida, as wanted with diferent countie names
state shortname info.governor countie_name
0 Florida FL Rick Scott Dade
0 Florida FL Rick Scott Broward
0 Florida FL Rick Scott Palm Beach
1 Ohio OH John Kasich Summit
1 Ohio OH John Kasich Cuyahoga
Imagine you have the content of a basket of product as a nested json, to explode the content of the basket while keeping the general basket attributes, you can do this.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Justin |
