'Is there a standard algorithm or library for representing data deltas?

I have a Python dictionary with fairly complex structure — multiple layers of nested values, some of which are dicts and some of which are lists. I want to represent changes to the data in a compact way that can be applied easily.

For dictionary-only values, it seems not too hard — you can just make a dict that mirrors the structure of the main data but including only the modified keys on their parents, and call a slightly modified .update() that detects a tombstone value in case you need to delete a key entirely.

But with lists involved, it seems to get a lot trickier. It seems like I'd need to come up with some kind of custom addressing scheme that needs to account for a lot of cases — you can't just naively use list indices as keys, because you need to support e.g. the removal of element 5 at the same time as an insertion between elements 2 and 3.

Additionally, if lists aren't restricted to being leaves, it's tricky to specify changes to items contained in a list while also modifying the elements of that list.

Is there a Python library that standardizes something like this? Or a standard algorithm/approach that's relatively sane to implement?

for reference, here is a function that implements what I'm looking for for dict-only data:

def update(d, u):
    for k, v in u.items():
        if v == 'del':
            del d[k]
        elif isinstance(v, collections.abc.Mapping):
            d[k] = update(d.get(k, {}), v)
        else:
            d[k] = v
    return d

>>> d = {1: 2, 3: {4: 5, 6: 7}}
>>> delta = {3: {4: 'del', 6: 8}, 9: 10}
>>> update(d, delta)
{1: 2, 3: {6: 8}, 9: 10}



Solution 1:[1]

Here's a modified version of the dict class that allows for addressing values by multiple keys:

class MultiKeyDict(dict):
    def __setitem__(self, __k, __v):
        super().__setitem__(__k, __v)
        if isinstance(__k, (tuple, list)):
            if not hasattr(self, "extmap"): # This could be done in the __init__ function but I did it here instead to avoid overriding it for simplicity
                self.extmap = {}
            self.extmap.update(dict.fromkeys(__k, __k))

    def __getitem__(self, __k):
        if __k in self.keys():
            return super().__getitem__(__k)
        elif hasattr(self, "extmap"):
            if __k in self.extmap.keys():
                return super().__getitem__(self.extmap.get(__k))
        return super().__getitem__(__k) # Call the super function again to trigger the correct exception

It only stores the actual value once, either under a normal str key if that's what's provided or under a tuple key if the provided key is a tuple or a list. Additional keys are just saved in another dict for redirection to the key that the value itself is saved under.

This is technically a response to @Stef's answer but I am not yet able to post replies as I just created my Stack Overflow account and there's a minimum requirement of 50 reputation points.

Solution 2:[2]

Check out this example. https://hands-on.cloud/aws-cloudformation-how-to-create-dms-infrastructure-for-relational-db-migration/.

I usually find the official documentation to be useful too. For each resource type, the documentation provides both JSON and YAML format and detailed specification. https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/AWS_DMS.html.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dunkmania
Solution 2 Register Sole