'Python convert child dicts with same key name to csv (DictWriter)
I have a list of json file like below:
[
{"A":{"value":1}, "B":{"value":2}},
{"A":{"value":9}, "B":{"value":3}}
]
Which I want to turn to csv like so:
A.value,B.value
1,2
9,3
The issue is that I have nested keys which have the same name : value but should be in a separate column. I could not find an elegant solution to this anywhere yet. I would like to be able to do something like:
data = json.load(open(file, 'r'))
with open("output.csv", "w") as f:
columns = ["A.value","B.value"]
cw = csv.DictWriter(f, columns)
cw.writeheader()
cw.writerows(data)
Which I know would work if I did not have any nested keys. I found other questions similar to this but I don't think this applies to my situation.
As an extra challenge:
I'd rather keep a generic approach. Later I might have a list of jsons like:
[
{"A":{"value":1}, "B":{"value":2}, "key":90},
{"A":{"value":9}, "B":{"value":3}, "key":91}
]
Meaning not all keys I want to add to csv will have a nested value key!
**output ^ **
A.value,B.value,key
1,2,90
9,3,91
Solution 1:[1]
Flattening the dicts worked. Since there is a list of dicts, the flattening has to be done for each dict:
import collections
def flatten(d, parent_key='', sep='.'):
items = []
for k, v in d.items():
new_key = parent_key + sep + k if parent_key else k
if isinstance(v, collections.MutableMapping):
items.extend(flatten(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
def flatten_json_list(list):
flattened_list = []
for d in list:
flattened_list.append(flatten(d))
return flattened_list
Then the code should work as implemented in the question:
with open("out.csv","w") as f:
columns = ["A.value","B.value","key"]
cw = csv.DictWriter(f, columns)
cw.writeheader()
cw.writerows(data)
Solution 2:[2]
This should do the job:
cw.writerows([{f'{key}.value':val['value'] for key, val in row.items()} for row in data])
or as regular loop:
for row in data:
cw.writerow({f'{key}.value':val['value'] for key, val in row.items()})
EDIT
import csv
data = [
{"A":{"value":1}, "B":{"value":2}, "key":90},
{"A":{"value":9}, "B":{"value":3}, "key":91}
]
def parse(row):
for key, value in row.items():
try:
yield f'{key}.value', value['value']
except TypeError:
yield key, value
with open("output.csv", "w") as f:
columns = ['A.value', 'B.value', 'key']
cw = csv.DictWriter(f, columns)
cw.writeheader()
cw.writerows(dict(parse(row)) for row in data)
The only thing I don't like is the hardcoded headers
Solution 3:[3]
jsonpath-ng can parse even such a nested json object very easily. It can be installed by the following command:
pip install --upgrade jsonpath-ng
Code:
from collections import defaultdict
import jsonpath_ng as jp
import pandas as pd
import re
jp.jsonpath.auto_id_field = 'json_path'
def json_to_df(json):
expr = jp.parse(f'$..*.{jp.jsonpath.auto_id_field}')
d = defaultdict(list)
for m in expr.find(json):
if not isinstance(m.datum.value, (dict, list)):
d[re.sub(r'\[\d+]\.', '', m.value)].append(m.datum.value)
return pd.DataFrame(d)
data = [{"A":{"value":1}, "B":{"value":2}, "key":90},
{"A":{"value":9}, "B":{"value":3}, "key":91}]
df = json_to_df(data)
df.to_csv('output.csv', index=False)
Output:
| key | A.value | B.value |
|---|---|---|
| 90 | 1 | 2 |
| 91 | 9 | 3 |
Another complicated example:
data = [{"A":{"value":1}, "B":{"value":2, "C":{"value":6}}, "key":90},
{"A":{"value":9}, "B":{"value":3, "C":{"value":8}}, "key":91}]
json_to_df(data).to_csv('output.csv', index=False)
| key | A.value | B.value | B.C.value |
|---|---|---|---|
| 90 | 1 | 2 | 6 |
| 91 | 9 | 3 | 8 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | bcsta |
| Solution 2 | |
| Solution 3 | quasi-human |
