'how to write mapreduce in shell
I have some data like
00 13
00 15
01 12
02 52
02 12
and how can get the mr result
00 28
01 12
02 64
I'm novice at bash, any advice would be appreciated! thx
Solution 1:[1]
Answer from @123's comment above:
awk 'NF { a[$1] += $2 } END { for (i in a) print i, a[i] }' file
This has two parts. First, as you parse the file, every line with one or more fields will increment associative array a[] at index $1 (the value of the first column on the line) with $2 (the value of the second column on the line). This stores all of your data, adding as it finds duplicates.
Once the file is fully parsed, the END stanza triggers and you loop for each item i within a[]. When given multiple arguments, print will separate them with the output field separator (OFS, which defaults to a space: ), so this prints the array index i, a space, then the sum of all the input's rows matching that index (a[i]).
I added the NF test as a safety so that blank rows are ignored. (Zero fields means NF is zero, and when evaluated as a boolean, zero is false, so the condition is not met and nothing is run for that line.)
Solution 2:[2]
I noticed that the question didn't have a direct answer - even though I agree that the awk solution is much more fun than the one I'm giving here. However I did want to offer the code in Bash as it's not very sophisticated and someone might want to modify it.
The input file should be a whitespace seperated list of key-value pairs.
#/usr/bin/bash
declare -A HASHMAP
while IFS= read -r line
do
export $(printf "KEY=%s VALUE=%s" $(echo "$line"))
HASHMAP["$KEY"]=$((${HASHMAP["$KEY"]} + $VALUE))
done < input_file
for KEY in "${!HASHMAP[@]}"
do
echo "$KEY ${HASHMAP[$KEY]}"
done
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Peter Rhodes |
