'Hive, bucket map join doesn't work, but sort merge bucket join works fine. Why?
- I created 2 tables using blow sql and test the bucketed join, it fails(Job has 3 map tasks and 1 reduce task).
create table tmp(name string, id int)
clustered by(name) into 4 buckets
stored as textfile;
- Then I add "sort" to the above sql and test the sort merge bucket join, it works(Job has 3 map tasks but 0 reduce task).
create table tmp(name string, id int)
clustered by(name) sorted by(name ASC) into 4 buckets
stored as textfile;
Is that the the "sorted by" is a must for bucketed join?
settings:
set hive.execution.engine=mr;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.auto.convert.join=true;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
