'Combine Duplicate keys in dictionary and add its values
I have a dictionary which is defined as following:
movies=[('x',57),('y', 23),('z', 12), ('y', 10), ('x',22),('y',12)]
It can be seen that there are duplicate values in the dictionary. Hence I combined them and the resultant dicionsry looks as shown:
{'x': [57, 22], 'y': [23,10,12], 'z': [12]}
Whenever there are two or more values represented by the key I want the final value of the final key of the final dictionary to be the average of the key values in the original dictionary.It is important to note that the average should take place whenver there are two or more values associated with a single key.
Hence the final dictionary should be as follows:
{'x': [39.5], 'y': [15], 'z': [12]}
Where:
- key x=57+22/2=39.5
- key y=23+10+12/3=15
- Key z=12 (do note that in this case the value remains same as there is a single occourence of key z)
Solution 1:[1]
Not functionally different to other answers but this has no reliance on additional module imports:
movies = [('x', 57), ('y', 23), ('z', 12), ('y', 10), ('x', 22), ('y', 12)]
md = dict()
for k, v in movies:
md.setdefault(k, []).append(v)
md = {k: [sum(v) / len(v)] for k, v in md.items()}
print(md)
Output:
{'x': [39.5], 'y': [15.0], 'z': [12.0]}
Output is as required in the OP's question but surely keeping the mean in a list is unnecessary
Solution 2:[2]
traverse through the data, save the character as key and values in a list.
then again traverse to calculate the avg of the values
>>> from collections import defaultdict as dd
>>> movies=[('x',57),('y', 23),('z', 12), ('y', 10), ('x',22),('y',12)]
>>> x = dd(list)
>>>
>>> for a, v in movies:
... x[a].append(v)
...
>>> for i, v in x.items():
... x[i]=[sum(v)/len(v)]
...
>>> x
defaultdict(<class 'list'>, {'x': [39.5], 'y': [15.0], 'z': [12.0]})
>>>
Solution 3:[3]
i do this, it's works (but maybe there is a better way)
my_dic = {'x': [57, 22], 'y': [23,10,12], 'z': [12]}
for key in my_dic:
if len(my_dic[key]) > 1:
my_dic[key] = [sum(my_dic[key]) / len(my_dic[key])]
print(my_dic)
Solution 4:[4]
You can use groupby from itertools for that
from itertools import groupby
from statistics import mean
{key:[mean(item[1] for item in group)]
for key ,group in groupby(
sorted(movies, key=lambda x: x[0]),
lambda x: x[0]
)}
Output:
{'x': [39.5], 'y': [15], 'z': [12]}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | sahasrara62 |
| Solution 3 | magimix |
| Solution 4 |
