'Need guidance in joining SQL queries
The column - PCT_COL_IND is a BOOLEAN Column.
SELECT ID,
MAX(CASE WHEN PCT_COL LIKE '%no-response%%' OR PCT_COL LIKE '%ind-not-found%'
OR PCT_COL LIKE '%search_res_not_found%' OR PCT_COL LIKE '%empty%' THEN 'TRUE' ELSE 'FALSE' END) AS PCT_COL_IND
FROM table1 A
LEFT JOIN table2 B
ON B.p_id = A.p_id
WHERE date = '2022-02-02'
GROUP BY ID
)
ID PCT_COL_IND
519181 FALSE
694562 FALSE
694892 FALSE
When I run the above query separately, I'm getting values for PCT_COL_IND as FALSE.
When I join the above query with two other queries, I'm getting values for PCT_COL_IND as NULL. I'm trying to insert the source records to target table using Stored Procedure in Snowflake.
But when I join the above query along with two other queries, I'm getting NULL values for most of the records for PCT_COL_IND column. I should not display the NULL results.
Can someone suggest me with any logic?
I have mentioned the entire query below:
SELECT distinct
a1.id,
a1.date,
a1.begin_time,
a1.Pno,
b1.PCT_COL_TXT,
b1.PCT_COL_IND,
CASE WHEN final.id is NULL then 'I' ELSE 'U' END as DML_Type
FROM(
SELECT
distinct id,
min(date) over (partition by id order by date) AS date ,
first_value(timestamp_col) over (partition by id order by cast(pno as int) nulls last) as begin_time,
first_value(pg_col) over (partition by id order by cast(pno as int) nulls last) as Pno,
row_number() over(partition by id order by date) as RNK
FROM table1
WHERE date = '2021-11-02'
QUALIFY RNK=1
)a1
left join
(
SELECT id,
CASE
WHEN PCT_COL LIKE '%no-response% then true
WHEN PCT_COL LIKE '%no-response%' then true
WHEN PCT_COL LIKE '%no-response% then true
WHEN PCT_COL LIKE '%no-response%' THEN 'TRUE'
ELSE 'FALSE' END AS PCT_COL_IND,
CASE WHEN listagg(DISTINCT PCT_COL ,',')WITHIN GROUP(ORDER BY PCT_COL )='' THEN NULL
ELSE REPLACE(lower(LISTAGG(DISTINCT PCT_COL , ',') WITHIN GROUP(ORDER BY PCT_COL )),'+','') END AS PCT_COL_TXT
FROM table1 B
LEFT JOIN table2
ON B.s_id = A.s_id
WHERE date = '2021-11-02'
GROUP BY 1,2
)b1
ON b1.id=a1.id
LEFT JOIN
(select distinct
id ,
cr_date,
begin_time,
Pno,
fin_pno,
PCT_COL_TXT,
PCT_COL_IND,
row_number() over(partition by id order by cr_date DESC) as RNK
FROM table4
QUALIFY RNK=1
)final on a1.id = final.id
WHERE (final.id is NULL)
or
(
nvl(a1.id,'0') <> nvl(final.id,'0' )
OR nvl(a1.date,'0')<> nvl(final.cr_date,'0')
OR nvl( a1.Pno,'NA') <> nvl(final.Pno,'NA')
OR nvl( a1.fin_pno,'NA') <> nvl(final.fin_pno,'NA')
OR nvl( b1.PCT_COL_TXT,'NA') <> nvl(final.PCT_COL_TXT,'NA')
OR nvl( b1.PCT_COL_IND,'0') <> nvl(final.PCT_COL_IND,'0')
)
I have tried the logic but still getting NULL values.
Solution 1:[1]
So to review your SQL it can be cut down to this:
SELECT
a1.id
,b1.pct_col_ind
,CASE WHEN final.id is NULL then 'I' ELSE 'U' END as DML_Type
FROM (
SELECT
column1 as id
FROM VALUES
(1),
(2),
(3)
) AS a1
LEFT JOIN (
SELECT *
FROM VALUES
(1, true),
(2, true),
(3, false)
v(id, PCT_COL_IND)
) AS b1 ON a1.id = b1.id
LEFT JOIN (
SELECT *
FROM VALUES
--(1, true),
(2, true),
(3, false)
v(id, PCT_COL_IND)
) AS final ON a1.id = final.id
WHERE final.id is NULL
OR nvl(a1.id,'0') <> nvl(final.id,'0' )
OR nvl( b1.PCT_COL_IND,'0') <> nvl(final.PCT_COL_IND,'0')
;
We only get output under 3 conditions, where the final does not match the a1 row (this can be triggered by missing id some id's in final. So lets have 3 ID's in a1 that are not in final. and the in B1 have one of those be true, and one be false and one not exist there.
Now the next reason we get output is a1.id != final.id but given we join on them being equal, we will only be here if final.id is null, thus we already triggered the first OR or if there are nulls in the ID's of A1 at which point both a1.id and final.id are null, and thus converted to '0' thus are equal, thus it's a pointless clause and not the problem we are looking for.
So the third reason to get rows into the output. is if nvl( b1.PCT_COL_IND,'0') <> nvl(final.PCT_COL_IND,'0'). again if the right hand side is null we are already in and if both sides are null we are not going in. Thus this code is saying if I had a value value, and the left join on B1 failed then NVL will make b1.PCT_COL_IND -> '0'
all the talking transforms the code into:
SELECT
a1.id
,b1.pct_col_ind
,CASE WHEN final.id is NULL then 'I' ELSE 'U' END as DML_Type
FROM (
SELECT
column1 as id
FROM VALUES
(1),
(2),
(3),
(4),
(5)
) AS a1
LEFT JOIN (
SELECT *
FROM VALUES
(1, true)
,(2, false)
--,(3, false)
--,(4, true) /* we want to fail to match */
--,(5, true) /* we want to fail to match again */
v(id, PCT_COL_IND)
) AS b1 ON a1.id = b1.id
LEFT JOIN (
SELECT *
FROM VALUES
--(1, true),
--(2, true),
--(3, false)
(4, true),
(5, false)
v(id, PCT_COL_IND)
) AS final ON a1.id = final.id
WHERE final.id is NULL
--OR nvl(a1.id,'0') <> nvl(final.id,'0' )
OR nvl( b1.PCT_COL_IND,'0') <> nvl(final.PCT_COL_IND,'0')
;
| ID | PCT_COL_IND | DML_TYPE |
|---|---|---|
| 1 | TRUE | I |
| 2 | FALSE | I |
| 3 | NULL | I |
| 4 | NULL | U |
so row 5 is not present because final.PCT_COL_IND is false, and the null of b1.PCT_COL_IND is replace with the text string "0" which implicitly is converted to a false boolean value, thus the row is dropped. Here we are starting to see the impact of using strings verse boolean's perhaps you want strings after all.
but row 4 is present because the final.pct_col_ind is true, which is different to the empty string
rows 1,2,3 are present because they are not in final, but "did not have results in b1" thus 3 is null.
BUT you b1 is from the same data as a1, so that case should not happen. But that is how I triggered case 4.. so now I am perplexed. I might have to think this over more..
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Simeon Pilgrim |
