'Is there a standard algorithm or library for representing data deltas?
I have a Python dictionary with fairly complex structure — multiple layers of nested values, some of which are dicts and some of which are lists. I want to represent changes to the data in a compact way that can be applied easily.
For dictionary-only values, it seems not too hard — you can just make a dict that mirrors the structure of the main data but including only the modified keys on their parents, and call a slightly modified .update() that detects a tombstone value in case you need to delete a key entirely.
But with lists involved, it seems to get a lot trickier. It seems like I'd need to come up with some kind of custom addressing scheme that needs to account for a lot of cases — you can't just naively use list indices as keys, because you need to support e.g. the removal of element 5 at the same time as an insertion between elements 2 and 3.
Additionally, if lists aren't restricted to being leaves, it's tricky to specify changes to items contained in a list while also modifying the elements of that list.
Is there a Python library that standardizes something like this? Or a standard algorithm/approach that's relatively sane to implement?
for reference, here is a function that implements what I'm looking for for dict-only data:
def update(d, u):
for k, v in u.items():
if v == 'del':
del d[k]
elif isinstance(v, collections.abc.Mapping):
d[k] = update(d.get(k, {}), v)
else:
d[k] = v
return d
>>> d = {1: 2, 3: {4: 5, 6: 7}}
>>> delta = {3: {4: 'del', 6: 8}, 9: 10}
>>> update(d, delta)
{1: 2, 3: {6: 8}, 9: 10}
Solution 1:[1]
Here's a modified version of the dict class that allows for addressing values by multiple keys:
class MultiKeyDict(dict):
def __setitem__(self, __k, __v):
super().__setitem__(__k, __v)
if isinstance(__k, (tuple, list)):
if not hasattr(self, "extmap"): # This could be done in the __init__ function but I did it here instead to avoid overriding it for simplicity
self.extmap = {}
self.extmap.update(dict.fromkeys(__k, __k))
def __getitem__(self, __k):
if __k in self.keys():
return super().__getitem__(__k)
elif hasattr(self, "extmap"):
if __k in self.extmap.keys():
return super().__getitem__(self.extmap.get(__k))
return super().__getitem__(__k) # Call the super function again to trigger the correct exception
It only stores the actual value once, either under a normal str key if that's what's provided or under a tuple key if the provided key is a tuple or a list. Additional keys are just saved in another dict for redirection to the key that the value itself is saved under.
This is technically a response to @Stef's answer but I am not yet able to post replies as I just created my Stack Overflow account and there's a minimum requirement of 50 reputation points.
Solution 2:[2]
Check out this example. https://hands-on.cloud/aws-cloudformation-how-to-create-dms-infrastructure-for-relational-db-migration/.
I usually find the official documentation to be useful too. For each resource type, the documentation provides both JSON and YAML format and detailed specification. https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/AWS_DMS.html.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dunkmania |
| Solution 2 | Register Sole |
