'SQL Join is Returning Multiple Rows
Below is a Snowflake query that is trying to get the serial number that's associated with each account name (ACCT_NAME), account ID (ACCT_ID), SKU ID (SKU_ID_p2), and offer description (OFFER_DESC). However when I bring in the serial number column from the FACT_PRODUCT_OFFER_CUST_VALUE table, it goes from 3 rows to 500+ rows--any idea why? Here is my current query:
SELECT a.NAME AS ACCT_NAME,
a.ID AS ACCT_ID,
p2.NAME AS SKU_ID_p2,
dpo.OFFER_DESC
FROM ASSET ast
JOIN DB_A.SA.ACCOUNT a
ON a.ID = ast.ACCOUNT_ID
JOIN DB_A.SA.PRODUCT_2 p2
ON p2.ID = ast.PRODUCT_2_ID
JOIN DB_A.SA.OPPORTUNITY o
ON o.ACCOUNT_ID = ast.ACCOUNT_ID
JOIN DB_B.SB.DIM_PRODUCT_OFFER dpo
ON dpo.PRODUCT_ID = p2.ID
JOIN DB_B.SB.DIM_PRODUCT_OFFER_CUST_VALUE dpocv
ON dpocv.OFFER_SKU_ID = dpo.OFFER_SKU_ID
JOIN DB_B.SB.FACT_PRODUCT_OFFER_CUST_VALUE fpocv
ON fpocv.SERIAL_NUM = dpocv.SERIAL_NUM
GROUP BY a.NAME,
a.ID,
p2.NAME,
dpo.OFFER_DESC
And here's a sample result of this query:
| ACCT_NAME | ACCT_ID | SKU_ID_P2 | OFFER_DESC |
|---|---|---|---|
| Joe's Hardware Shop | 9084785487488HUY | JH-WEQ-3S-FER-KIR | Plywood: 500M (Config B) |
| Joe's Hardware Shop | 9084785487488HUY | JH-WEQ-230j-FER-KIR | Plywood: 1000M (Config D) |
| Joe's Hardware Shop | 9084785487488HUY | JH-ITK-OPKF-LFPL-KIR1 | Plywood: 250M (Config A) |
However when I bring in the Serial Number column from the FACT_PRODUCT_OFFER_CUST_VALUE table, the original query jumps from 3 rows to 500+ rows. The correct output should have a unique serial number for the three rows in the table above. Correct output below:
| ACCT_NAME | ACCT_ID | SKU_ID_P2 | OFFER_DESC | SERIAL_NUM |
|---|---|---|---|---|
| Joe's Hardware Shop | 9084785487488HUY | JH-WEQ-3S-FER-KIR | Washer: Model A (Config B) | AUEJKLOR |
| Joe's Hardware Shop | 9084785487488HUY | JH-WEQ-230j-FER-KIR | Washer: Model B (Config D) | AGITJRD |
| Joe's Hardware Shop | 9084785487488HUY | JH-ITK-OPKF-LFPL-KIR1 | Washer: Model E (Config A) | AREJKEK |
Updated query with SERIAL_NUM brought in as a column:
SELECT a.NAME AS ACCT_NAME,
a.ID AS ACCT_ID,
p2.NAME AS SKU_ID_p2,
dpo.OFFER_DESC,
fpocv.SERIAL_NUM
FROM ASSET ast
JOIN DB_A.SA.ACCOUNT a
ON a.ID = ast.ACCOUNT_ID
JOIN DB_A.SA.PRODUCT_2 p2
ON p2.ID = ast.PRODUCT_2_ID
JOIN DB_A.SA.OPPORTUNITY o
ON o.ACCOUNT_ID = ast.ACCOUNT_ID
JOIN DB_B.SB.DIM_PRODUCT_OFFER dpo
ON dpo.PRODUCT_ID = p2.ID
JOIN DB_B.SB.DIM_PRODUCT_OFFER_CUST_VALUE dpocv
ON dpocv.OFFER_SKU_ID = dpo.OFFER_SKU_ID
JOIN DB_B.SB.FACT_PRODUCT_OFFER_CUST_VALUE fpocv
ON fpocv.SERIAL_NUM = dpocv.SERIAL_NUM
GROUP BY a.NAME,
a.ID,
p2.NAME,
dpo.OFFER_DESC,
fpocv.SERIAL_NUM
Solution 1:[1]
The simple answer is: You have many rows in the new table with the same value as you are join on. This is the ONLY reason you ever get more rows.
As to the next question why do you have many value that are the same, that is harder to know. But given you have used the word FACT it's making me think of a SCD2 like table, and thus while there is only one Plywood: 500M (Config B) over time it has had different prices. And when you join to a FACT/DIMENSION table like price, it is also had to be with respect to time.
Which implies you ether want the latest price, OR you want the price as the time of the event to a looking at.
Normally a DIMENSION table will have start_time, end_time columns and often I have seen then with end_time being null if it is the current "price", so you will need more clauses on your JOIN ON to bind to the row you want.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Simeon Pilgrim |
