'MapReduce Job: How do I take in <Text, IntWritable> during Map phase and output <Text, Text> in Reduce phase?
I am trying to make my output look like the following: Model output
But I am stuck with this: My output
How do I convert the value (IntWritable) from the output to Text and concatenate the string " words" into the output? I also need to format the numbers from the output to start at the same spot as shown in the model answer. The input is <Text, IntWritable> and I am guessing the output has to be <Text, Text>.
My codes for mapper:
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private final static IntWritable zero = new IntWritable(0);
private Text word1 = new Text("1.X short:");
private Text word2 = new Text("2.short:");
private Text word3 = new Text("3.medium:");
private Text word4 = new Text("4.long:");
private Text word5 = new Text("5.X long:");
private Text word6 = new Text("6.XX long:");
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String word = itr.nextToken();
int length = word.length();
if ((length >= 1) && (length <= 3)){
context.write(word1, one);
}
else
context.write(word1, zero);
if ((length >= 4) && (length <= 5)){
context.write(word2, one);
}
else
context.write(word2, zero);
if ((length >= 6) && (length <= 8)){
context.write(word3, one);
}
else
context.write(word3, zero);
if ((length >= 9) && (length <= 12)){
context.write(word4, one);
}
else
context.write(word4, zero);
if ((length >= 13) && (length <= 15)){
context.write(word5, one);
}
else
context.write(word5, zero);
if (length >= 16){
context.write(word6, one);
}
else
context.write(word6, zero);
}
}
My codes for Reducer:
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
key.set(key.toString().substring(1));
context.write(key, result);
}
Solution 1:[1]
So, first, you don't need to write zeros at all in the mapper. Just focus on the ones if you are summing data.
Then, it's a simple change - Change your output type
// in the driver
job.setOutputValueClass(Text.class);
And
extends Reducer<Text,IntWritable, Text, Text>
And just return the correct information
context.write(key, new Text(String.format("%d words", result)))
format the numbers from the output to start at the same spot
Is that really necessary? You can do this with string padding in the String.format method, but I wouldn't really worry about it.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
