'how to write mapreduce in shell

I have some data like

00 13
00 15
01 12
02 52
02 12

and how can get the mr result

00 28
01 12
02 64 

I'm novice at bash, any advice would be appreciated! thx



Solution 1:[1]

Answer from @123's comment above:

 awk 'NF { a[$1] += $2 } END { for (i in a) print i, a[i] }' file

This has two parts. First, as you parse the file, every line with one or more fields will increment associative array a[] at index $1 (the value of the first column on the line) with $2 (the value of the second column on the line). This stores all of your data, adding as it finds duplicates.

Once the file is fully parsed, the END stanza triggers and you loop for each item i within a[]. When given multiple arguments, print will separate them with the output field separator (OFS, which defaults to a space: ), so this prints the array index i, a space, then the sum of all the input's rows matching that index (a[i]).

I added the NF test as a safety so that blank rows are ignored. (Zero fields means NF is zero, and when evaluated as a boolean, zero is false, so the condition is not met and nothing is run for that line.)

Solution 2:[2]

I noticed that the question didn't have a direct answer - even though I agree that the awk solution is much more fun than the one I'm giving here. However I did want to offer the code in Bash as it's not very sophisticated and someone might want to modify it.

The input file should be a whitespace seperated list of key-value pairs.

#/usr/bin/bash

declare -A HASHMAP

while IFS= read -r line
do
  export $(printf "KEY=%s VALUE=%s" $(echo "$line"))
  HASHMAP["$KEY"]=$((${HASHMAP["$KEY"]} + $VALUE))
done < input_file

for KEY in "${!HASHMAP[@]}"
do
    echo "$KEY ${HASHMAP[$KEY]}"
done

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Peter Rhodes