'Hive - Optimising a self-join

Let's say I have the following query:

select a.model, a.engine_size, b.engine_size from (

  select model, engine_size
  from cars
  where number_of_doors = 4
) a

inner join (

  select model, engine_size
  from cars
  where number_of_doors = 4
) b

on (a.model = b.model);

I'm repeating a subquery here. I'm just wondering if the following is more 'optimal' or will the repeated subquery's result automatically be cached?

with features as (

  select model, engine_size
  from cars
  where number_of_doors = 4
)

select a.model, a.engine_size, b.engine_size
from features a
inner join features b
on (a.model = b.model);

Is either of these going to be more efficient?

sql hadoop hive

Solution 1:^[1]

One way is by doing a self join but the scenario doesn't make any sense

select a.model, a.engine_size,b.engine_size
from   cars a 
join   cars b 
on     (a.model = b.model)
where  a.number_of_doors = 4

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Pà®°à®¤à¯€à®ªà¯

'Hive - Optimising a self-join

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]