'Hive - Optimising a self-join
Let's say I have the following query:
select a.model, a.engine_size, b.engine_size from (
select model, engine_size
from cars
where number_of_doors = 4
) a
inner join (
select model, engine_size
from cars
where number_of_doors = 4
) b
on (a.model = b.model);
I'm repeating a subquery here. I'm just wondering if the following is more 'optimal' or will the repeated subquery's result automatically be cached?
with features as (
select model, engine_size
from cars
where number_of_doors = 4
)
select a.model, a.engine_size, b.engine_size
from features a
inner join features b
on (a.model = b.model);
Is either of these going to be more efficient?
Solution 1:[1]
One way is by doing a self join but the scenario doesn't make any sense
select a.model, a.engine_size,b.engine_size
from cars a
join cars b
on (a.model = b.model)
where a.number_of_doors = 4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Pரதீப௠|
