'How create dataframe from list of dictionary of multi level json
So I have a json file that have multiple level,by using pandas I can read the first level and put it in a dataframe,but the problem is as you can see in the dataframe Column Comments and hastags the second level is inside a column have format of list of dictionary,is there any solution for make the second level dictionary into dataframe. I try to use for loop and json_normalize but it always throw an error. Any suggestion? My code is like this
import pandas as pd
df2 = pd.read_json("data.json")
cid = []
for x in df2["comments"]:
cid.append(x.get('cid'))
data = pd.DataFrame({'cid':cid})
If i use the code it throw an error since I try access list by string not index.
AttributeError: 'list' object has no attribute 'get'
Even I change it into integer it got dictionary inside a column.How to change the dict inside the column or is there another easier way to do this? Dictionary in column
for x in df2["comments"]:
cid.append(x[0])
data = pd.DataFrame({'cid':cid})
for y in data:
print(y.get('cid'))
Example of first row of the data frame
[{'cid': '7000061798167266075', 'createTime': 1629828926, 'text': 'Done 🥰', 'diggCount': 1, 'replyCommentTotal': 0, 'uid': '6529379534374092801', 'uniqueId': 'alikayanti'}, {'cid': '6999869922566783771', 'createTime': 1629784228, 'text': 'haloo min, yg udah ikutan di misi sebelumnya boleh ikutan lagi gaa?', 'diggCount': 1, 'replyCommentTotal': 1, 'uid': '6842932775562642433', 'uniqueId': 'fia_654'}, {'cid': '7000248857603588891', 'createTime': 1629872457, 'text': 'bell bottoms maksudnya apa kak?\napakah benar artinya bel bawah?', 'diggCount': 0, 'replyCommentTotal': 2, 'uid': '6960940768727417857', 'uniqueId': 'keterimadiptn1'}, {'cid': '7000322023545455387', 'createTime': 1629889491, 'text': 'syudah🥰', 'diggCount': 0, 'replyCommentTotal': 0, 'uid': '6806645499672839170', 'uniqueId': 'miftahulhaqqu'}, {'cid': '7001271117180977947', 'createTime': 1630110475, 'text': 'kak, perpanjang dong waktu posting videonya :)', 'diggCount': 1, 'replyCommentTotal': 0, 'uid': '6921267441846830082', 'uniqueId': 'elisabetkst'}]
Solution 1:[1]
Maybe this solves your problem:
Defined the following function which unnests any json:
import json
import pandas as pd
def flatten_nested_json_df(df):
df = df.reset_index()
s = (df.applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df.applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
while len(list_columns) > 0 or len(dict_columns) > 0:
new_columns = []
for col in dict_columns:
horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
horiz_exploded.index = df.index
df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
new_columns.extend(horiz_exploded.columns) # inplace
for col in list_columns:
#print(f"exploding: {col}")
df = df.drop(columns=[col]).join(df[col].explode().to_frame())
new_columns.append(col)
s = (df[new_columns].applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df[new_columns].applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
return df
and do this:
results = pd.json_normalize(data)
df = pd.DataFrame(results)
outdf = flatten_nested_json_df(df)
which returns:
index cid createTime \
0 0 7000061798167266075 1629828926
1 1 6999869922566783771 1629784228
2 2 7000248857603588891 1629872457
3 3 7000322023545455387 1629889491
4 4 7001271117180977947 1630110475
text diggCount \
0 Done ? 1
1 haloo min, yg udah ikutan di misi sebelumnya b... 1
2 bell bottoms maksudnya apa kak?\napakah benar ... 0
3 syudah? 0
4 kak, perpanjang dong waktu posting videonya :) 1
replyCommentTotal uid uniqueId
0 0 6529379534374092801 alikayanti
1 1 6842932775562642433 fia_654
2 2 6960940768727417857 keterimadiptn1
3 0 6806645499672839170 miftahulhaqqu
4 0 6921267441846830082 elisabetkst
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Serge de Gosson de Varennes |
