'join produces wrong results though the joining columns are of same datatype. Right results are after casting them to different datatype

I have an interesting problem to find the root cause. Inner join 2 tables on 2 columns, but the results are not expected. The data types for both columns are the same in both tables. Here is the test code with sample data.

CREATE TABLE IF NOT EXISTS products
(
    prod_id BIGINT NOT NULL
    ,prod_no INTEGER NOT NULL
    ,prod_name VARCHAR(10) NOT NULL
    ,eff_start_dt TIMESTAMP NOT NULL
    ,lob VARCHAR(100)
    ,market VARCHAR(100)
)
DISTSTYLE KEY
 DISTKEY (prod_id)
 SORTKEY (prod_id, prod_no, lob, market)
;

/*
prod_id  | prod_no  | prod_name  | eff_start_dt  | lob     | market |
---------------------------------------------------------------------
123      | 3        | 'abcd'     | 9/13/21       | 'Sales' | 'US'   |
*/


CREATE TABLE IF NOT EXISTS products_sales
(
    prod_id BIGINT NOT NULL
    ,prod_no INTEGER NOT NULL
    ,sales_team VARCHAR(10) NOT NULL
    ,eff_dt TIMESTAMP NOT null
    ,cust_typ VARCHAR(15)
)
DISTSTYLE KEY
 DISTKEY (prod_id)
 SORTKEY (prod_id, prod_no)
;

/*
prod_id  | prod_no  | sales_team  | eff_dt  | cust_typ
------------------------------------------------------
123      | 3        | 'CustServ'  | 12/1/21 | 'Ent'
123      | 3        | 'Marketing' | 11/5/21 | 'Ent'
*/


select tab1.prod_id, tab1.prod_no, tab1.lob, tab2.sales_team
from products tab1 
inner join products_sales tab2
on tab1.prod_id = tab2.prod_id
and tab1.prod_no = tab2.prod_no
where tab2.cust_typ = 'Ent'
and tab1.prod_id = 123
and tab1.prod_no = 3;

/* Results: only 1 row
prod_id  | prod_no  | eff_dt  | sales_team
-------------------------------------------
123      | 3        | 12/1/21 | 'CustServ'
*/

/* Expected Results: 2 rows
prod_id  | prod_no  | eff_dt  | sales_team
-------------------------------------------
123      | 3        | 12/1/21 | 'CustServ'
123      | 3        | 11/5/21 | 'Marketing'
*/

I can achieve the accurate results in many other ways, listed below:

  • If I change the inner join to the left join, it produces accurate results.
  • If I explicitly cast the second column prod_no in the join condition to another datatype, it produces an accurate result. and cast(tab1.prod_no as bigint) = tab2.prod_no
  • If I create a temporary table with just the table join (with no filter) and then query the temporary table with filters, I get the accurate result.
  • If I create another set of these tables and copy the data to the new tables and if I run the same query on new tables, it produces accurate results but existing tables don't.

Here I'm looking for help to find the root cause of this problem in Redshift. Any expert guidance or any lead is highly appreciated!



Solution 1:[1]

The below SQL runs fine on my Redshift. There was an error in your provided code so I expect that you reduce your issue to this test case. I still strongly suspect a PK or Unique constraint that Redshift doesn't enforce. If you are still having issues please provide a demonstration of the issue and/or more information about when this issue occurs.

CREATE TABLE IF NOT EXISTS products
(
    prod_id BIGINT NOT NULL
    ,prod_no INTEGER NOT NULL
    ,prod_name VARCHAR(10) NOT NULL
    ,eff_start_dt TIMESTAMP NOT NULL
    ,lob VARCHAR(100)
    ,market VARCHAR(100)
)
DISTSTYLE KEY
DISTKEY (prod_id)
SORTKEY (prod_id, prod_no, lob, market)
;

insert into products values
(123, 3, 'abcd', '2021/9/13 00:00:00', 'Sales', 'US' );

CREATE TABLE IF NOT EXISTS products_sales
(
    prod_id BIGINT NOT NULL
    ,prod_no INTEGER NOT NULL
    ,sales_team VARCHAR(10) NOT NULL
    ,eff_dt TIMESTAMP NOT null
    ,cust_type VARCHAR(15)  
)
DISTSTYLE KEY
DISTKEY (prod_id)
SORTKEY (prod_id, prod_no)
;

insert into products_sales values
(123, 3, 'CustServ', '2021/12/1 00:00:00', 'Ent'),
(123, 3, 'Marketing', '2021/11/5 00:00:00', 'Ent');

select tab1.prod_id, tab1.prod_no, tab1.lob, tab2.sales_team
from products tab1 
inner join products_sales tab2
on tab1.prod_id = tab2.prod_id
and tab1.prod_no = tab2.prod_no
where tab2.cust_type = 'Ent'
and tab1.prod_id = 123
and tab1.prod_no = 3;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Bill Weiner