'Hive inequality join using between operator

We have a query similar to the below: (partition_date is our tables partition)

SELECT * FROM A
JOIN B 
where partition_date > B.last_runtime;

We realized that by placing the condition in the where clause it is causing a full table scan so we need to place it in the JOIN as an ON.

The problem with this is Hive does not support inequality joins so was thinking of using BETWEEN operator like the below:

Select * from A
JOIN B ON par_date between B.last_runtime and '99999999';

this is giving us the error: Both left and right aliases encountered in JOIN ''99999999''

If I replace B.last_runtime with an actual value, say '20160310' it works okay...

any ideas? Thanks in advance



Solution 1:[1]

A BETWEEN B AND C translates to A is greater than or equal to B AND A less than or equal to C, so I think it's still an non-equijoin.

However, I cannot explain the meaning of the error message. It's thrown here, if you'd like to analyze the source code:

private static boolean hasTableAlias(JoinTypeCheckCtx ctx, String tabName, ASTNode expr)
    throws SemanticException {
  int tblAliasCnt = 0;
  for (RowResolver rr : ctx.getInputRRList()) {
    if (rr.hasTableAlias(tabName))
      tblAliasCnt++;
  }

  if (tblAliasCnt > 1) {
    throw new SemanticException(ErrorMsg.INVALID_JOIN_CONDITION_1.getMsg(expr));
  }

  return (tblAliasCnt == 1) ? true : false;
}

Solution 2:[2]

Hive will not support any operations like >,<,<=,>= during join condition. Might be left or right join. Here's an example:

select A.Name, A.Address, B.salary from Person_details as A left join Person_earnings as B on (B.salary > 15000) 

Instead

select A.Name, A.Address, B.salary from Person_details as A left join Person_earnings as B on (A.Id=B.Id) where B.salary > 15000

First, An equality operation should be done, later other conditions can be applied. As Hive is used on large data sets it only supports equality condition first.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Marek Grzenkowicz
Solution 2 dloeda