'How to create a dataclass with optional fields that outputs field in json only if the field is not None

I am unclear about how to use a @dataclass to convert a mongo doc into a python dataclass. With my NSQL documents they may or may not contain some of the fields. I only want to output a field (using asdict) from the dataclass if that field was present in the mongo document.

Is there a way to create a field that will be output with dataclasses.asdict only if it exists in the mongo doc?

I have tried using post_init but have not figured out a solution.

# in this example I want to output the 'author' field ONLY if it is present in the mongo document
@dataclass
class StoryTitle:
    _id: str 
    title: str
    author: InitVar[str] = None
    dateOfPub: int = None

    def __post_init__(self, author):
        print(f'__post_init__ got called....with {author}')
        if author is not None:
            self.newauthor = author
            print(f'self.author is now {self.newauthor}')

# foo and bar approximate documents in mongodb
foo = dict(_id='b23435xx3e4qq', title = 'goldielocks and the big bears', author='mary', dateOfPub = 220415)

newFoo = StoryTitle(**foo)
json_foo = json.dumps(asdict(newFoo))
print(json_foo)

bar = dict(_id='b23435xx3e4qq', title = 'War and Peace', dateOfPub = 220415)
newBar = StoryTitle(**bar)
json_bar = json.dumps(asdict(newBar))
print(json_bar)

My output json does not (of course) have the 'author' field. Anyone know how to accomplish this? I suppose I could just create my own asdict method ...



Solution 1:[1]

The dataclasses.asdict helper function doesn't offer a way to exclude fields with default or un-initialized values unfortunately -- however, the dataclass-wizard library does.

The dataclass-wizard is a (de)serialization library I've created, which is built on top of dataclasses module. It adds no extra dependencies outside of stdlib, only the typing-extensions module for compatibility reasons with earlier Python versions.

To skip dataclass fields with default or un-initialized values in serialization for ex. with asdict, the dataclass-wizard provides the skip_defaults option. However, there is also a minor issue I noted with your code above. If we set a default for the author field as None, that means that we won't be able to distinguish between null values and also the case when author field is not present when de-serializing the json data.

So in below example, I've created a CustomNull object similar to the None singleton in python. The name and implementation doesn't matter overmuch, however in our case we use it as a sentinel object to determine if a value for author is passed in or not. If it is not present in the input data when from_dict is called, then we simply exclude it when serializing data with to_dict or asdict, as shown below.

from __future__ import annotations  # can be removed in Python 3.10+

from dataclasses import dataclass
from dataclass_wizard import JSONWizard


# create our own custom `NoneType` class
class CustomNullType:
    # these methods are not really needed, but useful to have.
    def __repr__(self):
        return '<null>'

    def __bool__(self):
        return False


# this is analogous to the builtin `None = NoneType()`
CustomNull = CustomNullType()


# in this example I want to output the 'author' field ONLY if it is present in the mongo document
@dataclass
class StoryTitle(JSONWizard):

    class _(JSONWizard.Meta):
        # skip default values for dataclass fields when `to_dict` is called
        skip_defaults = True

    _id: str
    title: str
    # note: we could also define it like
    #  author: str | None = None
    # however, using that approach we won't know if the value is
    # populated as a `null` when de-serializing the json data.
    author: str | None = CustomNull
    # by default, the `dataclass-wizard` library uses regex to case transform
    # json fields to snake case, and caches the field name for next time.
    # dateOfPub: int = None
    date_of_pub: int = None


# foo and bar approximate documents in mongodb
foo = dict(_id='b23435xx3e4qq', title='goldielocks and the big bears', author='mary', dateOfPub=220415)

new_foo = StoryTitle.from_dict(foo)
json_foo = new_foo.to_json()
print(json_foo)

bar = dict(_id='b23435xx3e4qq', title='War and Peace', dateOfPub=220415)
new_bar = StoryTitle.from_dict(bar)
json_bar = new_bar.to_json()
print(json_bar)

# lastly, we try de-serializing with `author=null`. the `author` field should still
# be populated when serializing the instance, as it was present in input data.
bar = dict(_id='b23435xx3e4qq', title='War and Peace', dateOfPub=220415, author=None)
new_bar = StoryTitle.from_dict(bar)
json_bar = new_bar.to_json()
print(json_bar)

Output:

{"_id": "b23435xx3e4qq", "title": "goldielocks and the big bears", "author": "mary", "dateOfPub": 220415}
{"_id": "b23435xx3e4qq", "title": "War and Peace", "dateOfPub": 220415}
{"_id": "b23435xx3e4qq", "title": "War and Peace", "author": null, "dateOfPub": 220415}

Note: the dataclass-wizard can be installed with pip:

$ pip install dataclass-wizard

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1