'Validating Indentation within YAML file

I am trying to write Python code that checks if indentation within a YAML file is indented correctly and flags an error if any inconsistencies exist.

For example, the second occurrence of the key-value pair mapping "class" has 4 spaces before it when it should instead have 6 spaces (like the first occurrence).

I have dozens of these YAML files with thousands of entries. So, I need an automated way to check if the indentation is inaccurate.

How could I achieve this within Python?

students:
  incoming:
  - enrolled: TRUE
    semester: final
    destination:
      name:
      - John Walsh
      - Heather Dunbar
      class:
      - 1258
  - enrolled: TRUE
    semester: final
    destination:
      name:
      - Alfred Flynn
      - Joe Diaz      
    class: ## incorrectly indented entry.
      - 3662

Here's my code:

class_indentation = "      class:"

with open("yamls/students.yaml", "r") as file:
    for line_number, line in enumerate(file, start=1):  
        if class_indentation in line:
          print(f"Indentation for '{class_indentation}' is valid: {line_number}")
          break
        else:
          print(f"Indentation for '{class_indentation}' is NOT valid: {line_number}")
print("Search completed.")


Solution 1:[1]

TL;DR: use a YAML parser and test valid nesting: if the class node is child of destination.

YAML parsers

There are 2 major YAML parsers for Python:

  • ruamel.yaml, preserves more of the original (like comments, ordering, etc.)
  • pyyaml, which can be seen as the predecessor to ruamel.yaml

For simplicity I will use pyyaml below.

Using pyyaml to test valid nesting

I found How to parse deeply nested yaml data structures in python and reused the answered functions here.

Below code looks for invalid indentation, if a class element is not child of destination:

import yaml

yaml_text = '''
students:
  incoming:
  - enrolled: TRUE
    semester: final
    destination:
      name:
      - John Walsh
      - Heather Dunbar
      class:
      - 1258
  - enrolled: TRUE
    semester: final
    destination:
      name:
      - Alfred Flynn
      - Joe Diaz      
    class: ## incorrectly indented entry.
      - 3662
'''

def lookup(sk, d, path=[]):
   # lookup the values for key(s) sk return as list the tuple (path to the value, value)
   if isinstance(d, dict):
       for k, v in d.items():
           if k == sk:
               yield (path + [k], v)
           for res in lookup(sk, v, path + [k]):
               yield res
   elif isinstance(d, list):
       for item in d:
           for res in lookup(sk, item, path + [item]):
               yield res


tree_dict = yaml.safe_load(yaml_text)
for (segments, value) in lookup("class", tree_dict):
    if segments[-2] != 'destination':
        print("Invalid indentation!  Not child of 'destination':")
    else:
       print("OK:")
    print(f"\tpath-segments: {segments}\n\tvalue: {value}")

Prints:

OK:
    path-segments: ['students', 'incoming', {'enrolled': True, 'semester': 'final', 'destination': {'name': ['John Walsh', 'Heather Dunbar'], 'class': [1258]}}, 'destination', 'class']
    value: [1258]
Invalid indentation!  Not child of 'destination':
    path-segments: ['students', 'incoming', {'enrolled': True, 'semester': 'final', 'destination': {'name': ['Alfred Flynn', 'Joe Diaz']}, 'class': [3662]}, 'class']
    value: [3662]

Using ruamel.yaml to test valid nesting

You can also adapt to ruamel.yaml without loss of functionality. Simply change import and loading:

# import yaml
from ruamel.yaml import YAML

# tree_dict = yaml.safe_load(yaml_text)
tree_dict = YAML(typ='safe').load(yaml_text)

Alternative: validate YAML using a schema

Alternatively you can also validate your YAML files against a schema. For example using JSON-schame since YAML can be seen as superset to JSON.

See Validating a yaml document in python for more.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1