'Export multiple pandas dataframe in a single json object
I have multiple pandas.DataFrames objects that I would like to dump in a single json string.
Let's say that I have the two following dfs:
import pandas as pd
import json
df1 = pd.DataFrame(
[["a", "b"], ["c", "d"]],
index=["row 1", "row 2"],
columns=["col 1", "col 2"],
)
df2 = pd.DataFrame(
[["A", "B", "C"], ["D", "E", "F"]],
index=["Row 1", "Row 2"],
columns=["Col 1", "Col 2", "Col3"],
)
I want to export them in a single json string as:
{"df1":
{"columns":
["col 1", "col 2"],
"index":
["row 1", "row 2"],
"data":
[["a", "b"], ["c", "d"]]
},
"df2":
{"columns":
["Col 1", "Col 2", "Col3"],
"index":
["Row 1", "Row 2"],
"data":
[["A", "B", "C"], ["D", "E", "F"]]
}
}
My tries
Try 1
If I create a single dictionary in python containing both dataframes and then I pass it to json.dumps, I receive a TypeError since json does not know how to serialize a pandas.DafaFrame:
out = {'df1': df1,
'df2': df2
}
out = json.dumps(out) #<-- Raises TypeError: Object of type DataFrame is not JSON serializable
Try 2
If I serialize each df individually using the pandas.DataFrame.to_json method as
df1_jsonstr = df1.to_json(orient='split')
df2_jsonstr = df2.to_json(orient='split')
out = {'df1': df1_jsonstr,
'df2': df2_jsonstr
}
out = json.dumps(out)
The output looks like:
{"df1": "{\"columns\":[\"col 1\",\"col 2\"],\"index\":[\"row 1\",\"row 2\"],\"data\":[[\"a\",\"b\"],[\"c\",\"d\"]]}", "df2": "{\"columns\":[\"Col 1\",\"Col 2\",\"Col3\"],\"index\":[\"Row 1\",\"Row 2\"],\"data\":[[\"A\",\"B\",\"C\"],[\"D\",\"E\",\"F\"]]}"}
Both strings generated by pandas.DataFrame.to_json have been escaped and quoted. When I try to load them back doing data = json.loads(out), the two dataframes are considered (correctly) strings and are loaded as such.
Try 3
The only way I found to generate the json file I want is to dump the dataframe to json using pandas.DataFrame.to_json, then load them back into dictionaries with json.loads and then dump them again together. This looks like:
df1_json = df1.to_json(orient='split')
df2_json = df2.to_json(orient='split')
out = {'df1': json.loads(df1_json),
'df2': json.loads(df2_json)
}
out = json.dumps(out)
data = json.loads(out)
This works, but if df1 and df2 have hundreds of thousands or millions of lines, you can understand that this performs the conversion three times (pd.DataFrame -> str -> dict -> str) becoming inefficient.
Question
Is there a way to achieve the same result as my last example, but performing a single conversion?
Solution 1:[1]
I think you could do something like:
out = """
{
"df1": """ + df1.to_json(orient='split') + """,
"df2": """ + df2.to_json(orient='split') + """
}
"""
or:
df1_json = df1.to_dict()
df2_json = df2.to_dict()
out = {'df1': df1_json
'df2': df2_json
}
out = json.dumps(out)
data = json.loads(out)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
