'How to tag nodes implicitly in yaml (PyYAML)
Consider this yaml file:
!my-type
name: My type
items:
- name: First item
number: 42
- name: Second item
number: 43
There is one top level object that contains a collection of dictionaries, and I can load it fine with PyYAML. Now, I want to use a proper class instead of these item dictionaries:
!my-type
name: My type
items:
- !my-type-item
name: First item
number: 42
- !my-type-item
name: Second item
number: 43
But this syntax is cumbersome and redundant, since all items in this collection are of the same type. And it gets very ugly when there are hundreds of these items. Is it possible to tag these items implicitly?
I considered using yaml.add_path_resolver but this API does not seem to be public or stable.
Solution 1:[1]
The YAML spec says
Resolving the tag of a node must only depend on the following three parameters: (1) the non-specific tag of the node, (2) the path leading from the root to the node and (3) the content (and hence the kind) of the node.
which means you are in accordance to the spec when you do this. I guess this is what add_path_resolver tries to implement.
The problem here is that Python does not have classes with declared, typed fields. Languages that have those can inspect them and load data with the proper type implicitly (done by SnakeYAML, go-yaml et al.). With PyYAML, to do this you'll need to implement a custom constructor, e.g.:
import yaml
def get_value(node, name):
assert isinstance(node, yaml.MappingNode)
for key, value in node.value:
assert isinstance(key, yaml.ScalarNode)
if key.value == name:
return value
class MyTypeItem:
def __init__(self, name, number):
self.name, self.number = name, number
@classmethod
def from_yaml(cls, loader, node):
name = get_value(node, "name")
assert isinstance(name, yaml.ScalarNode)
number = get_value(node, "number")
assert isinstance(number, yaml.ScalarNode)
return MyTypeItem(name.value, int(number.value))
def __repr__(self):
return f"MyTypeItem(name={self.name}, number={self.number})"
class MyType(yaml.YAMLObject):
yaml_tag = "!my-type"
def __init__(self, name, items):
self.name, self.items = name, items
@classmethod
def from_yaml(cls, loader, node):
name = get_value(node, "name")
assert isinstance(name, yaml.ScalarNode)
items = get_value(node, "items")
assert isinstance(items, yaml.SequenceNode)
return MyType(name.value,
[MyTypeItem.from_yaml(loader, n) for n in items.value])
def __repr__(self):
return f"MyType(name={self.name}, items={self.items})"
input = """
!my-type
name: My type
items:
- name: First item
number: 42
- name: Second item
number: 43
"""
print(yaml.load(input, yaml.FullLoader))
This gives you:
MyType(name=My type, items=[MyTypeItem(name=First item, number=42), MyTypeItem(name=Second item, number=43)])
Only the uppermost class derives from yaml.YAMLObject and has a yaml_tag, so that PyYAML can implicitly use it for the root item. MyTypeItem.from_yaml is called explictly from MyType and thus doesn't need to register with PyYAML (you can do that to also be able to load files that contain such an item directly).
You need to do conversions to non-string values manually (as shown with int(number.value)) since .value of any scalar node is always a string.
Solution 2:[2]
To make it easier on yourself, I would use suggest using dataclasses along with the dataclass-wizard for a high level approach.
Here's an approach using YAMLWizard and the PyYAML library for parsing YAML to a nested dataclass structure:
from __future__ import annotations
from dataclasses import dataclass
from dataclass_wizard import YAMLWizard
@dataclass
class MyContainer(YAMLWizard):
name: str
items: list[MyItem]
@dataclass
class MyItem:
name: str
number: int
if __name__ == '__main__':
yaml = """
name: My type
items:
- name: First item
number: 42
- name: Second item
number: 43
"""
c = MyContainer.from_yaml(yaml)
print(c)
Output:
MyContainer(name='My type', items=[MyItem(name='First item', number=42), MyItem(name='Second item', number=43)])
Note: This requires the yaml extra, which then brings in the PyYAML dependency:
$ pip install dataclass-wizard[yaml]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | flyx |
| Solution 2 | rv.kvetch |
