'Serialize dataclass with class as a field to JSON in Python
At a project I'm contributing to we have a very simple, but important class (let's call it LegacyClass). Modifying it would be a long process.
I'm contributing new dataclasses (like NormalDataclass) to this project and I need to be able to serialize them to JSON.
I don't have access to the JSON encoder, so I cannot specify custom encoder.
Here you can find sample code
import dataclasses
import collections
import json
#region I cannot easily change this code
class LegacyClass(collections.abc.Iterable):
def __init__(self, a, b):
self.a = a
self.b = b
def __iter__(self):
yield self.a
yield self.b
def __repr__(self):
return f"({self.a}, {self.b})"
#endregion
#region I can do whatever I want to this part of code
@dataclasses.dataclass
class NormalDataclass:
legacy_class: LegacyClass
legacy_class = LegacyClass('a', 'b')
normal_dataclass = NormalDataclass(legacy_class)
normal_dataclass_dict = dataclasses.asdict(normal_dataclass)
#endregion
#region I cannot easily change this code
json.dumps(normal_dataclass_dict)
#endregion
What I would want to get:
{"legacy_class": {"a": "a", "b": "b"}}
What I'm getting:
TypeError: Object of type LegacyClass is not JSON serializable
Do you have any suggestions?
Specifying dict_factory as an argument to dataclasses.asdict would be an option, if there would not be multiple levels of LegacyClass nesting, eg:
@dataclasses.dataclass
class AnotherNormalDataclass:
custom_class: List[Tuple[int, LegacyClass]]
To make dict_factory recursive would be to basically rewrite dataclasses.asdict implementation.
Solution 1:[1]
Edit: The simplest solution, based on the most recent edit to the question above, would be to define your own dict() method which returns a JSON-serializable dict object. Though in the long term, I'd probably suggest contacting the team who implements the json.dumps part, to see if they can update the encoder implementation for the dataclass.
In any case, here's a working example you can use for the present scenario:
import dataclasses
import collections
import json
class LegacyClass(collections.abc.Iterable):
def __init__(self, a, b):
self.a = a
self.b = b
def __iter__(self):
yield self.a
yield self.b
def __repr__(self):
return f"({self.a}, {self.b})"
@dataclasses.dataclass
class NormalDataclass:
legacy_class: LegacyClass
def dict(self):
return {'legacy_class': self.legacy_class.__dict__}
legacy_class = LegacyClass('a', 'b')
normal_dataclass = NormalDataclass(legacy_class)
normal_dataclass_dict = normal_dataclass.dict()
print(normal_dataclass_dict)
json.dumps(normal_dataclass_dict)
Output:
{'legacy_class': {'a': 'a', 'b': 'b'}}
You should be able to pass default argument to json.dumps, which will be called whenever the encoder finds an object that it can't serialize to JSON, for example a Python class or a datetime object.
For example:
import dataclasses
import collections
import json
class LegacyClass(collections.abc.Iterable):
def __init__(self, a, b):
self.a = a
self.b = b
def __iter__(self):
yield self.a
yield self.b
def __repr__(self):
return f"({self.a}, {self.b})"
@dataclasses.dataclass
class NormalDataclass:
legacy_class: LegacyClass
legacy_class = LegacyClass('aa', 'bb')
normal_dataclass = NormalDataclass(legacy_class)
normal_dataclass_dict = dataclasses.asdict(normal_dataclass)
o = json.dumps(normal_dataclass_dict,
### ADDED ###
default=lambda o: o.__dict__)
print(o) # {"legacy_class": {"a": "aa", "b": "bb"}}
If you have a more complex use case, you could consider creating a default function which can check the type of each value as it gets serialized to JSON:
import dataclasses
import collections
import json
from datetime import date, time
from typing import Any
class LegacyClass(collections.abc.Iterable):
def __init__(self, a, b):
self.a = a
self.b = b
def __iter__(self):
yield self.a
yield self.b
def __repr__(self):
return f"({self.a}, {self.b})"
@dataclasses.dataclass
class NormalDataclass:
legacy_class: LegacyClass
my_date: date = date.min
legacy_class = LegacyClass('aa', 'bb')
normal_dataclass = NormalDataclass(legacy_class)
normal_dataclass_dict = dataclasses.asdict(normal_dataclass)
def default_func(o: Any):
# it's a date, time, or datetime
if isinstance(o, (date, time)):
return o.isoformat()
# it's a Python class (with a `__dict__` attribute)
if isinstance(type(o), type) and hasattr(o, '__dict__'):
return o.__dict__
# print a warning and return a null
print(f'couldn\'t find an encoder for: {o!r}, type={type(o)}')
return None
o = json.dumps(normal_dataclass_dict, default=default_func)
print(o) # {"legacy_class": {"a": "aa", "b": "bb"}, "my_date": "0001-01-01"}
Solution 2:[2]
If you don't mind using a third-party dependency, you can solve the problem with mashumaro. You just need to add DataClassDictMixin and register a custom serialization / deserialization method for LegacyClass:
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
@dataclasses.dataclass
class NormalDataclass(DataClassDictMixin):
legacy_class: LegacyClass
class Config(BaseConfig):
serialization_strategy = {
LegacyClass: {
"serialize": lambda x: {"a": x.a, "b": x.b},
"deserialize": lambda d: LegacyClass(d["a"], d["b"]),
}
}
legacy_class = LegacyClass('a', 'b')
normal_dataclass = NormalDataclass(legacy_class)
normal_dataclass_dict = normal_dataclass.to_dict()
print(normal_dataclass_dict)
s = json.dumps(normal_dataclass_dict)
print(s)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | tikhonov_a |
