'python I got a complex multiple nested JSON file, how to convert to csv file
Here is my code, It can only convert part of the JSON file, it fails to flatten all JSON,Unable to convert all files
import pandas as pd
import json
all_data = []
add_header = True
with open('C:\\Users\\jeri\\Desktop\\1.json',encoding='utf-8') as f_json:
for line in f_json:
line = line.strip()
if line:
all_data.append(json.loads(line))
df = pd.json_normalize(all_data)
df.to_csv('C:\\Users\\jeri\\Desktop\\11.csv', index=False,encoding='utf-8',header=add_header)
add_header = False
my json file
{"id":"aa","sex":"male","name":[{"Fn":"jeri","Ln":"teri"}],"age":45,"info":[{"address":{"State":"NY","City":"new york"},"start_date":"2001-09","title":{"name":"Doctor","Exp":"head"},"year":"2001","month":"05"}],"other":null,"Hobby":[{"smoking":null,"gamble":null}],"connect":[{"phone":"123456789","email":"[email protected]"}],"Education":"MBA","School":{"State":"NY","City":"new york"}}
{"id":"aa","sex":"female","name":[{"Fn":"lo","Ln":"li"}],"age":34,"info":[{"address":{"State":"NY","City":"new york"},"start_date":"2008-11","title":{"name":"Doctor","Exp":"hand"},"year":"2008","month":"02"}],"other":null,"Hobby":[{"smoking":null,"gamble":null}],"connect":[{"phone":"123456789","email":"[email protected]"}],"Education":"MBA","School":{"State":"NY","City":"new york"}}
The result of the conversion is below,Not all json files are converted,this is not what i want,I need to flatten and convert all files,
id,sex,name,age,info,other,Hobby,connect,Education,School.State,School.City
aa,male,"[{'Fn': 'jeri', 'Ln': 'teri'}]",45,"[{'address': {'State': 'NY', 'City': 'new york'}, 'start_date': '2001-09', 'title': {'name': 'Doctor', 'Exp': 'head'}, 'year': '2001', 'month': '05'}]",,"[{'smoking': None, 'gamble': None}]","[{'phone': '123456789', 'email': '[email protected]'}]",MBA,NY,new york
aa,female,"[{'Fn': 'lo', 'Ln': 'li'}]",34,"[{'address': {'State': 'NY', 'City': 'new york'}, 'start_date': '2008-11', 'title': {'name': 'Doctor', 'Exp': 'hand'}, 'year': '2008', 'month': '02'}]",,"[{'smoking': None, 'gamble': None}]","[{'phone': '123456789', 'email': '[email protected]'}]",MBA,NY,new york
new code
import pandas as pd
import json
data = []
add_header = True
with open('C:\\Users\\jeri\\Desktop\\1.json',encoding='utf-8') as f_json:
for line in f_json:
line = line.strip()
if line:
data.append(json.loads(line))
df = pd.json_normalize(data)
dfe = explode('name').explode('info').explode("Hobby"),pd.concat([df, pd.json_normalize(df.name),
pd.json_normalize(df.info),
pd.json_normalize(df.Hobby)], axis=1)
dfe.to_csv('C:\\Users\\jeri\\Desktop\\11.csv', index=False,encoding='utf-8',header=add_header)
add_header = False
output
id,sex,age,other,Education,School,Fn,Ln,start_date,year,month,address.State,address.City,title.name,title.Exp,phone,email,smoking,gamble
aa,male,45,,MBA,"{'State': 'NY', 'City': 'new york'}",jeri,teri,2001-09,2001,05,NY,new york,Doctor,head,123456789,[email protected],,
aa,female,34,,MBA,"{'State': 'NY', 'City': 'new york'}",lo,li,2008-11,2008,02,NY,new york,Doctor,hand,123456789,[email protected],,
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
