'How to aggregate distinct values of one key then sum the matching values of the other key?

I've made a loop that gives me data in the following format:

name_quant = [{'name_id': 'S00004', 'quantity': '1'}, {'name_id': 'S00004', 'quantity': '2'}, {'name_id': 'S00003', 'quantity': '1'}, 
 {'name_id': 'S00003', 'quantity': '2'}, {'name_id': 'S00003', 'quantity': '2'}, {'name_id': 'S00002', 'quantity': '1'}]

I used the following loop to get the values above:

namesequence = EventSequence.objects.filter(description="names").values("Details")

name_quant = [{ 'name_id': e['element'][33:39], 
                        'quantity': e['element'][50:51] } for e in namesequence ]

So my question is how can I aggregate the name_ids and sum the quantities of matching name_ids so that i get a result like so:

 name_sum = [{'name_id': 'S00001', 'quantity': '160'}, {'name_id': 'S00002', 'quantity': '50'}, {'name_id': 'S00003', 'quantity': '40'}, {'name_id': 'S00004', 'quantity': '90'}]

I would have used the sum function in Django but I have to subscript and loop though the value first which makes it a bit more complicated.



Solution 1:[1]

If I understand the question correctly, it looks like the requirement is to consolidate keys (name_id) by quantity. I can't see how the required output values are derived from the sample input data but that may be because it's incomplete.

name_quant = [{'name_id': 'S00004', 'quantity': '1'}, {'name_id': 'S00004', 'quantity': '2'}, {'name_id': 'S00003', 'quantity': '1'}, 
 {'name_id': 'S00003', 'quantity': '2'}, {'name_id': 'S00003', 'quantity': '2'}, {'name_id': 'S00002', 'quantity': '1'}]

td = dict()

for e in name_quant:
    nid = e['name_id']
    td[nid] = td.get(nid, 0) + int(e['quantity'])

new_list = [{'name_id': k, 'quantity': str(v)} for k, v in td.items()]

print(new_list)

Output:

[{'name_id': 'S00004', 'quantity': '3'}, {'name_id': 'S00003', 'quantity': '5'}, {'name_id': 'S00002', 'quantity': '1'}]

Solution 2:[2]

If the list of name_quant is large, I prefer to use pandas to do the groupby staff:

import pandas as pd

name_quant = [{'name_id': 'S00004', 'quantity': '1'}, {'name_id': 'S00004', 'quantity': '2'},
              {'name_id': 'S00003', 'quantity': '1'},
              {'name_id': 'S00003', 'quantity': '2'}, {'name_id': 'S00003', 'quantity': '2'},
              {'name_id': 'S00002', 'quantity': '1'}]

df = pd.DataFrame.from_records(name_quant)
df['quantity'] = df['quantity'].astype(int)
results = df.groupby(['name_id']).agg({'quantity': 'sum'}).to_records()  # [('S00002', 1) ('S00003', 5) ('S00004', 3)]
grouped_name_quant = [{'name_id': x[0], 'quantity': x[1]} for x in results]
print(grouped_name_quant)

The output is :

[{'name_id': 'S00002', 'quantity': 1}, {'name_id': 'S00003', 'quantity': 5}, {'name_id': 'S00004', 'quantity': 3}]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Albert Winestein
Solution 2 Menglong Li