Category "mapreduce"

Why spark is 100 times faster than Hadoop Map Reduce

Why spark is faster than Hadoop MapReduce?. As per my understanding if spark is faster due to in-memory processing then Hadoop is also load data into RAM then i

MapReduce Job Failed on MultiNode

I'm new to Hadoop. I have to use 'MapReduce' with WordCount. I am getting some errors. I am running a 50Gb 'MapReduce' job on a single server (8GB, 8 core). It

How to find the min of multiple values in HIVE?

Hive has min(col) to find the minimum value of a column. But how about finding the minimum of multiple values (NOT one column), for example select min(2,1,3,4

Iterate Twice in Map reduce

I have written a Reducer job in which my key and value is composite . I have a requirement of iterating twice through the values and hence trying to cache the v

Hive Error : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

I have got twitter data using flume on HDFS. Have 3 node cluster and MySQL Metastore for hive. When i execute below query select user_name.screen_name, user_n

Pseudocode to Calculate average using MapReduce

Hi I want to write a MapReduce algorithm in pseudo code to solve the following problem: Given input records in the following format: address, zip, city, house_v

Iterate twice through values in Reducer Hadoop

I read in couple of places that the only way to iterate twice through values in a Reducer is to cache that values. But also, there is a limitation that all the