'Validating Indentation within YAML file
I am trying to write Python code that checks if indentation within a YAML file is indented correctly and flags an error if any inconsistencies exist.
For example, the second occurrence of the key-value pair mapping "class" has 4 spaces before it when it should instead have 6 spaces (like the first occurrence).
I have dozens of these YAML files with thousands of entries. So, I need an automated way to check if the indentation is inaccurate.
How could I achieve this within Python?
students:
incoming:
- enrolled: TRUE
semester: final
destination:
name:
- John Walsh
- Heather Dunbar
class:
- 1258
- enrolled: TRUE
semester: final
destination:
name:
- Alfred Flynn
- Joe Diaz
class: ## incorrectly indented entry.
- 3662
Here's my code:
class_indentation = " class:"
with open("yamls/students.yaml", "r") as file:
for line_number, line in enumerate(file, start=1):
if class_indentation in line:
print(f"Indentation for '{class_indentation}' is valid: {line_number}")
break
else:
print(f"Indentation for '{class_indentation}' is NOT valid: {line_number}")
print("Search completed.")
Solution 1:[1]
TL;DR: use a YAML parser and test valid nesting: if the class node is child of destination.
YAML parsers
There are 2 major YAML parsers for Python:
ruamel.yaml, preserves more of the original (like comments, ordering, etc.)pyyaml, which can be seen as the predecessor to ruamel.yaml
For simplicity I will use pyyaml below.
Using pyyaml to test valid nesting
I found How to parse deeply nested yaml data structures in python and reused the answered functions here.
Below code looks for invalid indentation, if a class element is not child of destination:
import yaml
yaml_text = '''
students:
incoming:
- enrolled: TRUE
semester: final
destination:
name:
- John Walsh
- Heather Dunbar
class:
- 1258
- enrolled: TRUE
semester: final
destination:
name:
- Alfred Flynn
- Joe Diaz
class: ## incorrectly indented entry.
- 3662
'''
def lookup(sk, d, path=[]):
# lookup the values for key(s) sk return as list the tuple (path to the value, value)
if isinstance(d, dict):
for k, v in d.items():
if k == sk:
yield (path + [k], v)
for res in lookup(sk, v, path + [k]):
yield res
elif isinstance(d, list):
for item in d:
for res in lookup(sk, item, path + [item]):
yield res
tree_dict = yaml.safe_load(yaml_text)
for (segments, value) in lookup("class", tree_dict):
if segments[-2] != 'destination':
print("Invalid indentation! Not child of 'destination':")
else:
print("OK:")
print(f"\tpath-segments: {segments}\n\tvalue: {value}")
Prints:
OK:
path-segments: ['students', 'incoming', {'enrolled': True, 'semester': 'final', 'destination': {'name': ['John Walsh', 'Heather Dunbar'], 'class': [1258]}}, 'destination', 'class']
value: [1258]
Invalid indentation! Not child of 'destination':
path-segments: ['students', 'incoming', {'enrolled': True, 'semester': 'final', 'destination': {'name': ['Alfred Flynn', 'Joe Diaz']}, 'class': [3662]}, 'class']
value: [3662]
Using ruamel.yaml to test valid nesting
You can also adapt to ruamel.yaml without loss of functionality.
Simply change import and loading:
# import yaml
from ruamel.yaml import YAML
# tree_dict = yaml.safe_load(yaml_text)
tree_dict = YAML(typ='safe').load(yaml_text)
Alternative: validate YAML using a schema
Alternatively you can also validate your YAML files against a schema. For example using JSON-schame since YAML can be seen as superset to JSON.
See Validating a yaml document in python for more.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
