'Updating dictionaries based on huge list of dicts

I have a huge(around 350k elements) list of dictionaries:

lst = [
{'data': 'xxx', 'id': 1456},
{'data': 'yyy', 'id': 24234},
{'data': 'zzz', 'id': 3222},
{'data': 'foo', 'id': 1789},
]

On the other hand I receive dictionaries(around 550k) one by one with missing value(not every dict is missing this value) which I need to update from:

example_dict = {'key': 'x', 'key2': 'y', 'id': 1456, 'data': None}

To:

example_dict = {'key': 'x', 'key2': 'y', 'id': 1456, 'data': 'xxx'}

And I need to take each dict and search withing the list for matching 'id' and update the 'data'. Doing it this way takes ages to process:

if example_dict['data'] is None:
    for row in lst:
        if row['id'] == example_dict['id']:
            example_dict['data'] = row['data']

Is there a way to build a structured chunked data divided to in e.g. 10k values and tell the incoming dict in which chunk to search for the 'id'? Or any other way to to optimize that? Any help is appreciated, take care.



Solution 1:[1]

Some preprocessing of lst list might help a lot. E.g. transform that list of dicts into dictionary, where id would be a key.

To be precise transform lst into such structure:

lst = {
    '1456': 'xxx',
    '24234': 'yyy',
    '3222': 'zzz',
    ...
} 

Then when trying to check your data attributes in example_dict, just access straight to id key in lst as follows:

if example_dict['data'] is None:
    example_dict['data'] = lst.get(example_dict['id'])

It should reduce the time complexity from something as quadratic complexity (n*n) to linear complexity (n).

Solution 2:[2]

Try creating creating a hash table (in Python, a dict) from lst to speed up the lookup based on 'id':

        lst = [
            {'data': 'xxx', 'id': 1456},
            {'data': 'yyy', 'id': 24234},
            {'data': 'zzz', 'id': 3222},
            {'data': 'foo', 'id': 1789},
            ]        
        example_dict = {'key': 'x', 'key2': 'y', 'id': 1456, 'data': None}        
        dct ={row['id'] : row for row in lst}
        if example_dict['data'] is None:
            example_dict['data'] = dct[example_dict['id']]['data']
        print(example_dict)

Sample output:

{'key': 'x', 'key2': 'y', 'id': 1456, 'data': 'xxx'}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 constantstranger