'How to read list of json objects from Pandas DataFrame?
I want just want to loop through the array of json objects, and get the values of 'box'.....
I have a DataFrame which looks like this
img facesJson
0 2b26mn4.jpg [{'box': [57, 255, 91, 103], 'confidence': 0.7...
1 cd7ntf.jpg [{'box': [510, 85, 58, 87], 'confidence': 0.99...
2 m9kf3e.jpg [{'box': [328, 78, 93, 123], 'confidence': 0.9...
3 b4hx0n.jpg [{'box': [129, 30, 38, 54], 'confidence': 0.99...
4 afx0fm.jpg [{'box': [86, 126, 221, 298], 'confidence': 0....
and the column 'facesJson' (dstype = object) contain array of json objects which look like this:
[
{
"box":[ 158,115,84,112 ],
"confidence":0.9998929500579834,
},
{
"box":[ 404,105, 86,114 ],
"confidence":0.9996863603591919,
}
]
when i run this code
for index,row in df.iterrows():
df = pd.json_normalize(row['facesJson'])
print(len(df))
i get this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-53-8be848c6d44a> in <module>()
1 for index,row in savedList.iterrows():
----> 2 df = pd.json_normalize(row['facesJson'])
3 print(len(df))
1 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in <genexpr>(.0)
272
273 if record_path is None:
--> 274 if any([isinstance(x, dict) for x in y.values()] for y in data):
275 # naive normalization, this is idempotent for flat records
276 # and potentially will inflate the data considerably for
AttributeError: 'str' object has no attribute 'values'
Solution 1:[1]
You can use this code on loop for each item of that column.
import ast
line='[ { "box":[ 158,115,84,112 ], "confidence":0.9998929500579834, }, { "box":[ 404,105, 86,114 ], "confidence":0.9996863603591919, } ]'
parsed=ast.literal_eval(line)
print(parsed[0].keys())
Solution 2:[2]
I would suggest using a list comprehension to look up the value of 'box' in each JSON object:
df['facesJson'].apply(lambda json: [item['box'] for item in json])
Solution 3:[3]
That something I run into, it's may help someone, so I have data as follow:
txt = """{
"timeStamp": 1,
"src": [
{ "id": 0, "tag": "", "x": 0.000, "y": 0.000, "z": 0.000, "activity": 0.000 },
{ "id": 0, "tag": "", "x": 0.000, "y": 0.000, "z": 0.000, "activity": 0.000 },
{ "id": 0, "tag": "", "x": 0.000, "y": 0.000, "z": 0.000, "activity": 0.000 },
{ "id": 0, "tag": "", "x": 0.000, "y": 0.000, "z": 0.000, "activity": 0.000 }
]
}"""
I need to convert the "src" into an array as follow:
array([[0, '', 0.0, 0.0, 0.0, 0.0],
[0, '', 0.0, 0.0, 0.0, 0.0],
[0, '', 0.0, 0.0, 0.0, 0.0],
[0, '', 0.0, 0.0, 0.0, 0.0]], dtype=object)
here is how I did it:
import numpy as np
import pandas as pd
import json
from box import Box
dt = Box(json.loads(txt))
df = pd.DataFrame(dt.src)
ar = np.array(df)
if you print the df.head() you will get:
id tag x y z activity 0 0 '' 0.0 0.0 0.0 0.0 1 0 '' 0.0 0.0 0.0 0.0 2 0 '' 0.0 0.0 0.0 0.0 3 0 '' 0.0 0.0 0.0 0.0
if there a more simple way to do, please let me know.
Solution 4:[4]
Figured out a pretty simple case when you want the entire dataframe to be turned into a list of json objects representing a row
import json
import pandas as pd
df = pd.DataFrame([{'col1':1,'col2':2},{'col1':3,'col2':4}])
json_list = json.loads(df.to_json(orient='records'))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ali Adhami |
| Solution 2 | stinodego |
| Solution 3 | Walid Bousseta |
| Solution 4 | Inwon Kang |
