'Get common records between two tables based on condition
Let's say I have two tables: IndustryCustomers and ProductCustomers, they have same schema and just one column like this
IndustryCustomers:
| CustomerId |
|---|
| 1 |
| 2 |
| 3 |
ProductCustomers:
| CustomerId |
|---|
| 2 |
| 3 |
| 4 |
So what I want is:
1- if both industryCustomers and productCustomers have records then get common customers between them (simply by inner join on customerId)
2- if industryCustomers has any records but productCustomer has no records then select all industryCustomers
3- if industryCustomers has not any records then select all product customers
Currently I did this by using IF and select based on conditions, but I wonder if I can get customers by one query.
This is my query
IF EXISTS (SELECT TOP 1 1 FROM #IndustryCustomers)
BEGIN
IF EXISTS (SELECT TOP 1 1 FROM #ProductCustomers)
SELECT *
FROM #IndustryCustomers ic
JOIN #ProductCustomers pc
ON ic.CustomerId = pc.CustomerId;
ELSE
SELECT *
FROM #IndustryCustomers;
END;
ELSE
SELECT *
FROM #ProductCustomers;
Solution 1:[1]
Personally I probably would have utilized @Heinz's approach, but oddly enough the NOT EXISTS were performing worse than my solution. According to the execution plan, it seems the NOT EXISTS was scanning the entire table for no reason, not sure why. Will have to investigate further what is going on (I'm using SQL Server 2017 dev edition).
So here's a very succinct solution that seems to perform better than Heinzi's and Nick's solutions (in my very limited testing)
Succinct Solution using APPLY and Full Join
SELECT FinalCustomerID = ISNULL(I.CustomerID,P.CustomerID)
FROM #IndustryCustomers AS I
FULL JOIN #ProductCustomers AS P
ON I.CustomerID = P.CustomerID
CROSS APPLY (
SELECT
HasI = CASE WHEN EXISTS (SELECT * FROM #IndustryCustomers) THEN 'Y' ELSE 'N' END
,HasP = CASE WHEN EXISTS (SELECT * FROM #ProductCustomers ) THEN 'Y' ELSE 'N' END
) AS C
WHERE ('N' NOT IN (HasI,HasP) AND I.CustomerID = P.CustomerID)
OR (HasI = 'Y' AND HasP = 'N' AND I.CustomerID IS NOT NULL)
OR (HasI = 'N' AND HasP = 'Y' AND P.CustomerID IS NOT NULL)
Solution 2:[2]
You could UNION ALL your three SELECTs and put the corresponding condition in the WHERE clause, e.g.
SELECT ic.CustomerId
FROM #IndustryCustomers AS ic
INNER JOIN #ProductCustomers AS pc ON ic.CustomerId = pc.CustomerId
WHERE EXISTS (SELECT 1 FROM #IndustryCustomers)
AND EXISTS (SELECT 1 FROM #ProductCustomers)
UNION ALL
SELECT ic.CustomerId
FROM #IndustryCustomers AS ic
WHERE EXISTS (SELECT 1 FROM #IndustryCustomers)
AND NOT EXISTS (SELECT 1 FROM #ProductCustomers)
UNION ALL
SELECT pc.CustomerId
FROM #ProductCustomers AS pc
WHERE NOT EXISTS (SELECT 1 FROM #IndustryCustomers)
Obviously, this requires all three SQLs to return the same set of columns, so I've reduced * to the customer id.
I do think, though, that this "solution", while formally satisfying your requirements, is less readable than your current solution...
Solution 3:[3]
TLDR;
SELECT
*
FROM (
SELECT
CustomerID = CASE
WHEN NOT EXISTS (SELECT * FROM ProductCustomers) THEN
ic.CustomerID
WHEN NOT EXISTS (SELECT * FROM IndustryCustomers) THEN
pc.CustomerID
WHEN EXISTS (SELECT * FROM IndustryCustomers)
AND EXISTS (SELECT * FROM ProductCustomers)
AND ic.CustomerID = pc.CustomerID THEN
ic.CustomerID
END
FROM IndustryCustomers AS ic
FULL JOIN ProductCustomers AS pc
ON pc.CustomerID = ic.CustomerID
) AS x
WHERE x.CustomerID IS NOT NULL;
A break down of the query
Step 1: Get all of the data
If you want one query, but don't want to use a UNION, you will need to do a FULL JOIN of the two tables:
SELECT
*
FROM IndustryCustomers AS ic
FULL JOIN ProductCustomers AS pc
ON pc.CustomerID = ic.CustomerID;
| ic.CustomerID | pc.CustomerID |
|---|---|
| 2 | 2 |
| 3 | 3 |
| NULL | 4 |
| 1 | NULL |
Step 2: Filter the data in your select list based on your logic
Now you have all of the data you need to produce your desired results. Now change the columns in your results to return the results you want based on your logic. If there are no ProductCustomers, always return IndustryCustomers, if there are not IndustryCustomers always return ProductCustomers, and if both have records, only return the ones that match.
SELECT
CustomerID = CASE
WHEN NOT EXISTS (SELECT * FROM ProductCustomers) THEN
ic.CustomerID
WHEN NOT EXISTS (SELECT * FROM IndustryCustomers) THEN
pc.CustomerID
WHEN EXISTS (SELECT * FROM IndustryCustomers)
AND EXISTS (SELECT * FROM ProductCustomers)
AND ic.CustomerID = pc.CustomerID THEN
ic.CustomerID
END
FROM IndustryCustomers AS ic
FULL JOIN ProductCustomers AS pc
ON pc.CustomerID = ic.CustomerID;
| CustomerID |
|---|
| 2 |
| 3 |
| NULL |
| NULL |
Step 3: Clean up results by removing NULLS
This gives you the results you want, but you now have NULLs for rows in the result set that don't match your criteria. You have two options for getting rid of them:
Option 1
Copy your CASE statement to your WHERE clause and use that to filter out NULLs.
Pros: You have one 'SELECT` statement. No real benefit here unless you just prefer the way it looks.
Cons: Harder to read code and if you modify this logic later, you have to remember to update the logic in both places. IMHO, the con in this one is a big con. The chances of this happening are high. I see it happen all of the time when people are making quick updates to code.
SELECT
CustomerID = CASE
WHEN NOT EXISTS (SELECT * FROM ProductCustomers) THEN
ic.CustomerID
WHEN NOT EXISTS (SELECT * FROM IndustryCustomers) THEN
pc.CustomerID
WHEN EXISTS (SELECT * FROM IndustryCustomers)
AND EXISTS (SELECT * FROM ProductCustomers)
AND ic.CustomerID = pc.CustomerID THEN
ic.CustomerID
END
FROM IndustryCustomers AS ic
FULL JOIN ProductCustomers AS pc
ON pc.CustomerID = ic.CustomerID
WHERE (CASE
WHEN NOT EXISTS (SELECT * FROM ProductCustomers) THEN
ic.CustomerID
WHEN NOT EXISTS (SELECT * FROM IndustryCustomers) THEN
pc.CustomerID
WHEN EXISTS (SELECT * FROM IndustryCustomers)
AND EXISTS (SELECT * FROM ProductCustomers )
AND ic.CustomerID = pc.CustomerID THEN
ic.CustomerID
END
) IS NOT NULL;
Option 2
Wrap your query in a query that eliminates NULLS.
Pros: No duplicated logic to maintain, shorter easier to read code.
Cons: It's not a single SELECT statement, but functionally there are no cons.
SELECT
*
FROM (
SELECT
CustomerID = CASE
WHEN NOT EXISTS (SELECT * FROM ProductCustomers) THEN
ic.CustomerID
WHEN NOT EXISTS (SELECT * FROM IndustryCustomers) THEN
pc.CustomerID
WHEN EXISTS (SELECT * FROM IndustryCustomers)
AND EXISTS (SELECT * FROM ProductCustomers)
AND ic.CustomerID = pc.CustomerID THEN
ic.CustomerID
END
FROM IndustryCustomers AS ic
FULL JOIN ProductCustomers AS pc
ON pc.CustomerID = ic.CustomerID
) AS x
WHERE x.CustomerID IS NOT NULL;
Example code showing results for each scenario
I'm using a Common Table Expression (CTE) and a Table Value Constructor to build the example data. The query that selects the data is the same in each of these.
IndustryCustomers and ProductCustomers both have data
WITH
IndustryCustomers AS (
SELECT
IndustryCustomers.CustomerID
FROM ( VALUES (1), (2), (3)) AS IndustryCustomers (CustomerID)
),
ProductCustomers AS (
SELECT
ProductCustomers.CustomerID
FROM ( VALUES (2), (3), (4)) AS ProductCustomers (CustomerID)
)
SELECT
*
FROM (
SELECT
CustomerID = CASE
WHEN NOT EXISTS (SELECT * FROM ProductCustomers) THEN
ic.CustomerID
WHEN NOT EXISTS (SELECT * FROM IndustryCustomers) THEN
pc.CustomerID
WHEN EXISTS (SELECT * FROM IndustryCustomers)
AND EXISTS (SELECT * FROM ProductCustomers)
AND ic.CustomerID = pc.CustomerID THEN
ic.CustomerID
END
FROM IndustryCustomers AS ic
FULL JOIN ProductCustomers AS pc
ON pc.CustomerID = ic.CustomerID
) AS x
WHERE x.CustomerID IS NOT NULL;
| CustomerID |
|---|
| 2 |
| 3 |
ProductCustomers contains no data
WITH
IndustryCustomers AS (
SELECT
IndustryCustomers.CustomerID
FROM ( VALUES (1), (2), (3)) AS IndustryCustomers (CustomerID)
),
ProductCustomers AS (
SELECT CustomerID = NULL
WHERE 1 = 2
)
SELECT
*
FROM (
SELECT
CustomerID = CASE
WHEN NOT EXISTS (SELECT * FROM ProductCustomers) THEN
ic.CustomerID
WHEN NOT EXISTS (SELECT * FROM IndustryCustomers) THEN
pc.CustomerID
WHEN EXISTS (SELECT * FROM IndustryCustomers)
AND EXISTS (SELECT * FROM ProductCustomers)
AND ic.CustomerID = pc.CustomerID THEN
ic.CustomerID
END
FROM IndustryCustomers AS ic
FULL JOIN ProductCustomers AS pc
ON pc.CustomerID = ic.CustomerID
) AS x
WHERE x.CustomerID IS NOT NULL;
| CustomerID |
|---|
| 1 |
| 2 |
| 3 |
IndustryCustomers contains no data
WITH
IndustryCustomers AS (
SELECT CustomerID = NULL
WHERE 1 = 2
),
ProductCustomers AS (
SELECT
ProductCustomers.CustomerID
FROM ( VALUES (2), (3), (4)) AS ProductCustomers (CustomerID)
)
SELECT
*
FROM (
SELECT
CustomerID = CASE
WHEN NOT EXISTS (SELECT * FROM ProductCustomers) THEN
ic.CustomerID
WHEN NOT EXISTS (SELECT * FROM IndustryCustomers) THEN
pc.CustomerID
WHEN EXISTS (SELECT * FROM IndustryCustomers)
AND EXISTS (SELECT * FROM ProductCustomers)
AND ic.CustomerID = pc.CustomerID THEN
ic.CustomerID
END
FROM IndustryCustomers AS ic
FULL JOIN ProductCustomers AS pc
ON pc.CustomerID = ic.CustomerID
) AS x
WHERE x.CustomerID IS NOT NULL;
| CustomerID |
|---|
| 2 |
| 3 |
| 4 |
Extra notes
When using EXISTS clause always use the form SELECT * FROM .... Not only is the intent of the code more clear, but there are no performance differences between using *, 1, TOP 1 1, or Column1, ..., Column327. SQL Server stops executing the query as soon as it find a single result and never even considers the TOP. If you compare them you'll see that the execution plans are all identical.
EXISTS (SELECT 1...) vs EXISTS (SELECT TOP 1...) Does it matter?
Test with 10,000 records in each table, and only half of them overlap
SET STATISTICS IO, TIME ON
DECLARE
@IndustryStartID int = 1,
@IndustryEndID int = 10,
@ProductStartID int = 5,
@ProductEndID int = 15;
WITH
IndustryCustomers AS (
SELECT CustomerID = @IndustryStartID
UNION ALL
SELECT
ic.CustomerID + 1
FROM IndustryCustomers AS ic
WHERE ic.CustomerID + 1 <= @IndustryEndID
),
ProductCustomers AS (
SELECT CustomerID = @ProductStartID
UNION ALL
SELECT
pc.CustomerID + 1
FROM ProductCustomers AS pc
WHERE pc.CustomerID + 1 <= @ProductEndID
)
SELECT
*
FROM (
SELECT
CustomerID = CASE
WHEN NOT EXISTS (SELECT * FROM ProductCustomers) THEN
ic.CustomerID
WHEN NOT EXISTS (SELECT * FROM IndustryCustomers) THEN
pc.CustomerID
WHEN EXISTS (SELECT * FROM IndustryCustomers)
AND EXISTS (SELECT * FROM ProductCustomers)
AND ic.CustomerID = pc.CustomerID THEN
ic.CustomerID
END
FROM IndustryCustomers AS ic
FULL JOIN ProductCustomers AS pc
ON pc.CustomerID = ic.CustomerID
) AS x
WHERE x.CustomerID IS NOT NULL
OPTION (MAXRECURSION 10000);
SET STATISTICS IO, TIME OFF
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Stephan |
| Solution 2 | Heinzi |
| Solution 3 |
