'Convert a list of dict of [{str:int}, {str:int}, ... ] to a single dict of {str:int}

Given a data-structure like this:

[{'a':1, 'b': 2}, {'c':3 }, {'a':4, 'c':9}, {'d':0}, {'d': 0, 'b':6}]

The goal is to parse the data to produce:

{'a': 2.5, 'b': 4, 'c': 6, 'd': 0}

by doing:

  • Accumulate the values for each unique key,
  • Average the values per key

What's a simple way to achieve the data munging as desired above?


I've tried the following and it works:

from collections import defaultdict
from statistics import mean


x = [{'a':1, 'b': 2}, {'c':3 }, {'a':4, 'c':9}, {'d':0}, {'d': 0, 'b':6}]

z = defaultdict(list)

for y in x:
    for k, v in y.items():
        z[k].append(v)

output = {k: mean(v) for k,v in z.items()}

But is there a simpler way to achieve the same data-parsing? Maybe with collections.Counter or something?



Solution 1:[1]

If you want something with counter you could count the keys and values separately and then build the average like this:

original = [{'a':1, 'b': 2}, {'c':3 }, {'a':4, 'c':9}, {'d':0}, {'d': 0, 'b':6}]

sum_counter = dict(sum([Counter(x) for x in original], Counter()))
count_counter = dict(sum([Counter(x.keys()) for x in original], Counter()))
final = {k: sum_counter.get(k,0)/count_counter[k] for k in count_counter}

print(final)

Output:

{'a': 2.5, 'b': 4.0, 'c': 6.0, 'd': 0.0}

EDIT: I had another idea, which might be a simpler solution to your problem (turns out it is also a lot faster). The idea is to go over your list of dictionaries and create a new dictionary, where for each key the sum of values and number of occurrences is saved. Afterward, we can simply compute the average for each key by dividing the two values of the key.

from collections import defaultdict

original = [{'a':1, 'b': 2}, {'c':3 }, {'a':4, 'c':9}, {'d':0}, {'d': 0, 'b':6}]

ddict = defaultdict(lambda: [0,0])

for dictionary in original:
    for key in dictionary:
        ddict[key][0] += dictionary[key]
        ddict[key][1] += 1        
        
final = {k: ddict[k][0]/ddict[k][1] for k in ddict}
print(final)

Output is still the same:

{'a': 2.5, 'b': 4.0, 'c': 6.0, 'd': 0.0}

Solution 2:[2]

One option (similar to @JANO's answer) is to use collections.Counter, once to get the sum of values, then again to get the number of values for each key to get the list of keys across all dictionaries), use a dict comprehension to get the mean values:

from collections import Counter
from itertools import chain
sums = sum(map(Counter, lst), Counter())
counts = Counter(chain.from_iterable(map(dict.keys, lst)))
out = {k: sums[k] / v for k,v in counts.items()}

Yet another option is to use cytoolz.dicttoolz.merge_with to create a dictionary of lists, then iterate over it to get the mean values:

from cytoolz.dicttoolz import merge_with
out = {k: sum(v)/len(v) for k,v in merge_with(list, *lst).items()}

Output:

{'a': 2.5, 'b': 4.0, 'c': 6.0, 'd': 0.0}

Timings:

>>> lst = [{'a':1, 'b': 2}, {'c':3 }, {'a':4, 'c':9}, {'d':0}, {'d': 0, 'b':6}] * 100000

>>> %timeit counter_dc(lst)
3.32 s ± 90.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit defaultdict_dc(lst)
241 ms ± 18.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit dicttools_dc(lst)
66.9 ms ± 1.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

where the functions are:

def counter_dc(lst):
    sums = sum(map(Counter, lst), Counter())
    counts = Counter(chain.from_iterable(map(dict.keys, lst)))
    return {k: sums[k] / v for k,v in counts.items()}
    
def defaultdict_dc(lst):
    out = defaultdict(list)
    for d in lst:
        for k,v in d.items():
            out[k].append(v)
    return {k: sum(v)/len(v) for k,v in out.items()}

def dicttools_dc(lst):
    return {k: sum(v)/len(v) for k,v in merge_with(list, *lst).items()}

Solution 3:[3]

If you're open to using pandas, then simply:

lst = [{"a": 1, "b": 2}, {"c": 3}, {"a": 4, "c": 9}, {"d": 0}, {"d": 0, "b": 6}]

print(pd.DataFrame(lst).mean().to_dict())

Prints:

{'a': 2.5, 'b': 4.0, 'c': 6.0, 'd': 0.0}

Solution 4:[4]

hows this work for you?

def average_dicts(in_data):
    out_dict = {}
    for dic in in_data:
        for key, val in dic.items():
            if key in out_dict:
                out_dict[key] = (out_dict[key] + val) / 2
            else:
                out_dict.update({key: val})
    
    return out_dict

if __name__ == "__main__":
    
    in_data = [{'a':1, 'b': 2}, {'c':3 }, {'a':4, 'c':9}, {'d':0}, {'d': 0, 'b':6}]
    print(average_dicts(in_data))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Andrej Kesely
Solution 4 lonny