'dump ruamel yaml - preserving the original structure

I have this yaml file (file_in.yaml),

components: &base
  type1: 3  # sample comment
  type2: 0x353
  type3: 1.2.3.4
  type4: "bla"
  schemas:
    description: 'ex1' # this is comment
    description: 'ex2'

test2:
  <<: *base
  type4: 4555  # under the sea :)

yellow: &yellow
  bla: 1

collor: &yellow
  bla: 2

paint:
  color: *yellow

slot_si_value_t: &bla1
    desc: hav slot 2
    slot_number: 2 #SI_SLOT_2
    inst_max: 4

slot_si_value_t: &bla2
    desc: hav slot 4
    slot_number: 4 #SI_SLOT_4
    inst_max: 4

slot:
  - slot_si_value: *bla1
  - slot_si_value: *bla2

I load it with this code snippet, and dump it into another file.

import ruamel.yaml

ryaml = ruamel.yaml.YAML()
ryaml.allow_duplicate_keys = True
ryaml.preserve_quotes = True

with open("file_in.yaml") as si_file:
    si_data = ryaml.load(si_file)

with open("file_out.yaml", "w") as fp:
    ryaml.dump(si_data, fp)

the file_out.yaml looks like this,

components: &base
  type1: 3  # sample comment
  type2: 0x353
  type3: 1.2.3.4
  type4: "bla"
  schemas:
    description: 'ex1' # this is comment

test2:
  <<: *base
  type4: 4555  # under the sea :)

yellow:
  bla: 1

collor: &yellow
  bla: 2

paint:
  color: *yellow

slot_si_value_t: &bla1
  desc: hav slot 2
  slot_number: 2   #SI_SLOT_2
  inst_max: 4

slot:
- slot_si_value: *bla1
- slot_si_value:
    desc: hav slot 4
    slot_number: 4 #SI_SLOT_4
    inst_max: 4

I can see that the comments, quotes, hex values and the order are preserved, however the structure of the yaml is changed. Is there any ways to instruct ruamel to dump the exact format?

here is a side-by-side comparison,

enter image description here



Solution 1:[1]

You can preserve all anchors by changing the .yaml_set_anchor method of the CommentedBase. For historical reasons this is currently only done for scalar values that are anchored (i.e. it is inconsistent).

Having duplicate keys in your YAML mappings however makes the document invalid, because keys have to be unique according to the YAML specification. In order to allow loading of these faulty documents ruamel.yaml allows you to load such broken documents by setting .allow_duplicate_keys, but it doesn't support writing such incorrect documents and disregards further occurrences of the same key in a mapping (during loading, so you cannot access the values for those keys, unless the are anchored and aliased somewhere else).

That is why you "lose" the description: 'ext2' under key schemas, including the following empty line (which is part of that entries "comment")

The second occurrence of slot_si_value_t in the root mapping causes more problems. Because it is not preserved, the &bla2 anchored mapping exists only once in the loaded data and gets dumped (with an anchor because the .yaml_set_anchor change), within the sequence that is the value of slot.

import sys
import warnings
from pathlib import Path
import ruamel.yaml


def yaml_set_anchor(self, value, always_dump=True):
    self.anchor.value = value
    self.anchor.always_dump = always_dump

ruamel.yaml.comments.CommentedBase.yaml_set_anchor = yaml_set_anchor

in_file = Path('file_in.yaml')
   
yaml = ruamel.yaml.YAML()
yaml.allow_duplicate_keys = True
yaml.preserve_quotes = True
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    data = yaml.load(in_file)
yaml.dump(data, sys.stdout)

which gives:

components: &base
  type1: 3  # sample comment
  type2: 0x353
  type3: 1.2.3.4
  type4: "bla"
  schemas:
    description: 'ex1' # this is comment
test2:
  <<: *base
  type4: 4555  # under the sea :)

yellow: &yellow
  bla: 1

collor: &yellow
  bla: 2

paint:
  color: *yellow

slot_si_value_t: &bla1
  desc: hav slot 2
  slot_number: 2   #SI_SLOT_2
  inst_max: 4

slot:
- slot_si_value: *bla1
- slot_si_value: &bla2
    desc: hav slot 4
    slot_number: 4 #SI_SLOT_4
    inst_max: 4

You should update your input file to dispose, or change, the duplicate keys. Even then this will not exactly round-trip in ruamel.yaml since you have inconsistent indentation ( e.g. the root level mapping indents two spaces for components and four spaces for slot_si_value_t ), and that is being normalized.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1