'Get common records between two tables based on condition

Let's say I have two tables: IndustryCustomers and ProductCustomers, they have same schema and just one column like this

IndustryCustomers:

CustomerId
1
2
3

ProductCustomers:

CustomerId
2
3
4

So what I want is:

1- if both industryCustomers and productCustomers have records then get common customers between them (simply by inner join on customerId)

2- if industryCustomers has any records but productCustomer has no records then select all industryCustomers

3- if industryCustomers has not any records then select all product customers

Currently I did this by using IF and select based on conditions, but I wonder if I can get customers by one query.

This is my query

IF EXISTS (SELECT TOP 1 1 FROM #IndustryCustomers)
BEGIN
    IF EXISTS (SELECT TOP 1 1 FROM #ProductCustomers)
        SELECT *
        FROM #IndustryCustomers ic
            JOIN #ProductCustomers pc
                ON ic.CustomerId = pc.CustomerId;
    ELSE
        SELECT *
        FROM #IndustryCustomers;
END;
ELSE
    SELECT *
    FROM #ProductCustomers;

Solution 1:^[1]

Personally I probably would have utilized @Heinz's approach, but oddly enough the NOT EXISTS were performing worse than my solution. According to the execution plan, it seems the NOT EXISTS was scanning the entire table for no reason, not sure why. Will have to investigate further what is going on (I'm using SQL Server 2017 dev edition).

So here's a very succinct solution that seems to perform better than Heinzi's and Nick's solutions (in my very limited testing)

Succinct Solution using APPLY and Full Join

SELECT FinalCustomerID = ISNULL(I.CustomerID,P.CustomerID)
FROM #IndustryCustomers AS I
FULL JOIN #ProductCustomers AS P
    ON I.CustomerID = P.CustomerID
CROSS APPLY (
    SELECT 
     HasI = CASE WHEN EXISTS (SELECT * FROM #IndustryCustomers) THEN 'Y' ELSE 'N' END
    ,HasP = CASE WHEN EXISTS (SELECT * FROM #ProductCustomers ) THEN 'Y' ELSE 'N' END
) AS C
WHERE ('N' NOT IN (HasI,HasP) AND I.CustomerID = P.CustomerID)
    OR (HasI = 'Y' AND HasP = 'N' AND I.CustomerID IS NOT NULL)
    OR (HasI = 'N' AND HasP = 'Y' AND P.CustomerID IS NOT NULL)

Solution 2:^[2]

You could UNION ALL your three SELECTs and put the corresponding condition in the WHERE clause, e.g.

SELECT ic.CustomerId 
  FROM #IndustryCustomers AS ic 
       INNER JOIN #ProductCustomers AS pc ON ic.CustomerId = pc.CustomerId
 WHERE EXISTS (SELECT 1 FROM #IndustryCustomers) 
   AND EXISTS (SELECT 1 FROM #ProductCustomers)

UNION ALL 

SELECT ic.CustomerId 
  FROM #IndustryCustomers AS ic
 WHERE EXISTS (SELECT 1 FROM #IndustryCustomers) 
   AND NOT EXISTS (SELECT 1 FROM #ProductCustomers)

UNION ALL 

SELECT pc.CustomerId 
  FROM #ProductCustomers AS pc
 WHERE NOT EXISTS (SELECT 1 FROM #IndustryCustomers)

Obviously, this requires all three SQLs to return the same set of columns, so I've reduced * to the customer id.

I do think, though, that this "solution", while formally satisfying your requirements, is less readable than your current solution...

Solution 3:^[3]

TLDR;

SELECT
    *
FROM    (
    SELECT
        CustomerID  = CASE
                        WHEN NOT EXISTS (SELECT     * FROM ProductCustomers) THEN
                            ic.CustomerID
                        WHEN NOT EXISTS (SELECT     * FROM IndustryCustomers) THEN
                            pc.CustomerID
                        WHEN EXISTS (SELECT     * FROM IndustryCustomers)
                            AND EXISTS (SELECT      * FROM ProductCustomers)
                            AND ic.CustomerID = pc.CustomerID THEN
                            ic.CustomerID
                    END
    FROM    IndustryCustomers   AS ic
    FULL JOIN ProductCustomers AS pc
        ON pc.CustomerID = ic.CustomerID
) AS x
WHERE   x.CustomerID IS NOT NULL;

A break down of the query

Step 1: Get all of the data

If you want one query, but don't want to use a UNION, you will need to do a FULL JOIN of the two tables:

SELECT
    *
FROM    IndustryCustomers   AS ic
FULL JOIN ProductCustomers AS pc
    ON pc.CustomerID = ic.CustomerID;

ic.CustomerID	pc.CustomerID
2	2
3	3
NULL	4
1	NULL

Step 2: Filter the data in your select list based on your logic

Now you have all of the data you need to produce your desired results. Now change the columns in your results to return the results you want based on your logic. If there are no ProductCustomers, always return IndustryCustomers, if there are not IndustryCustomers always return ProductCustomers, and if both have records, only return the ones that match.

SELECT
    CustomerID  = CASE
                    WHEN NOT EXISTS (SELECT     * FROM ProductCustomers) THEN
                        ic.CustomerID
                    WHEN NOT EXISTS (SELECT     * FROM IndustryCustomers) THEN
                        pc.CustomerID
                    WHEN EXISTS (SELECT     * FROM IndustryCustomers)
                        AND EXISTS (SELECT      * FROM ProductCustomers)
                        AND ic.CustomerID = pc.CustomerID THEN
                        ic.CustomerID
                END
FROM    IndustryCustomers   AS ic
FULL JOIN ProductCustomers AS pc
    ON pc.CustomerID = ic.CustomerID;

CustomerID
2
3
NULL
NULL

Step 3: Clean up results by removing NULLS

This gives you the results you want, but you now have NULLs for rows in the result set that don't match your criteria. You have two options for getting rid of them:

Option 1

Copy your CASE statement to your WHERE clause and use that to filter out NULLs.

Pros: You have one 'SELECT` statement. No real benefit here unless you just prefer the way it looks.

Cons: Harder to read code and if you modify this logic later, you have to remember to update the logic in both places. IMHO, the con in this one is a big con. The chances of this happening are high. I see it happen all of the time when people are making quick updates to code.

SELECT
    CustomerID  = CASE
                    WHEN NOT EXISTS (SELECT     * FROM ProductCustomers) THEN
                        ic.CustomerID
                    WHEN NOT EXISTS (SELECT     * FROM IndustryCustomers) THEN
                        pc.CustomerID
                    WHEN EXISTS (SELECT     * FROM IndustryCustomers)
                        AND EXISTS (SELECT      * FROM ProductCustomers)
                        AND ic.CustomerID = pc.CustomerID THEN
                        ic.CustomerID
                END
FROM    IndustryCustomers   AS ic
FULL JOIN ProductCustomers AS pc
    ON pc.CustomerID = ic.CustomerID
WHERE   (CASE
            WHEN NOT EXISTS (SELECT * FROM ProductCustomers) THEN
                ic.CustomerID
            WHEN NOT EXISTS (SELECT * FROM IndustryCustomers) THEN
                pc.CustomerID
            WHEN EXISTS (SELECT * FROM IndustryCustomers)
                AND EXISTS (SELECT * FROM ProductCustomers )
                AND ic.CustomerID = pc.CustomerID THEN
                ic.CustomerID
        END
        )   IS NOT NULL;

Option 2

Wrap your query in a query that eliminates NULLS.

Pros: No duplicated logic to maintain, shorter easier to read code.

Cons: It's not a single SELECT statement, but functionally there are no cons.

SELECT
    *
FROM    (
    SELECT
        CustomerID  = CASE
                        WHEN NOT EXISTS (SELECT     * FROM ProductCustomers) THEN
                            ic.CustomerID
                        WHEN NOT EXISTS (SELECT     * FROM IndustryCustomers) THEN
                            pc.CustomerID
                        WHEN EXISTS (SELECT     * FROM IndustryCustomers)
                            AND EXISTS (SELECT      * FROM ProductCustomers)
                            AND ic.CustomerID = pc.CustomerID THEN
                            ic.CustomerID
                    END
    FROM    IndustryCustomers   AS ic
    FULL JOIN ProductCustomers AS pc
        ON pc.CustomerID = ic.CustomerID
) AS x
WHERE   x.CustomerID IS NOT NULL;

Example code showing results for each scenario

I'm using a Common Table Expression (CTE) and a Table Value Constructor to build the example data. The query that selects the data is the same in each of these.

`IndustryCustomers` and `ProductCustomers` both have data

WITH
    IndustryCustomers AS (
        SELECT
            IndustryCustomers.CustomerID
        FROM ( VALUES (1), (2), (3)) AS IndustryCustomers (CustomerID)
    ),
    ProductCustomers AS (
        SELECT
            ProductCustomers.CustomerID
        FROM ( VALUES (2), (3), (4)) AS ProductCustomers (CustomerID)
    )
SELECT
    *
FROM    (
    SELECT
        CustomerID  = CASE
                        WHEN NOT EXISTS (SELECT     * FROM ProductCustomers) THEN
                            ic.CustomerID
                        WHEN NOT EXISTS (SELECT     * FROM IndustryCustomers) THEN
                            pc.CustomerID
                        WHEN EXISTS (SELECT     * FROM IndustryCustomers)
                            AND EXISTS (SELECT      * FROM ProductCustomers)
                            AND ic.CustomerID = pc.CustomerID THEN
                            ic.CustomerID
                    END
    FROM    IndustryCustomers   AS ic
    FULL JOIN ProductCustomers AS pc
        ON pc.CustomerID = ic.CustomerID
) AS x
WHERE   x.CustomerID IS NOT NULL;

CustomerID
2
3

`ProductCustomers` contains no data

WITH
    IndustryCustomers AS (
        SELECT
            IndustryCustomers.CustomerID
        FROM ( VALUES (1), (2), (3)) AS IndustryCustomers (CustomerID)
    ),
    ProductCustomers AS (
        SELECT CustomerID = NULL
        WHERE 1 = 2
    )
SELECT
    *
FROM    (
    SELECT
        CustomerID  = CASE
                        WHEN NOT EXISTS (SELECT     * FROM ProductCustomers) THEN
                            ic.CustomerID
                        WHEN NOT EXISTS (SELECT     * FROM IndustryCustomers) THEN
                            pc.CustomerID
                        WHEN EXISTS (SELECT     * FROM IndustryCustomers)
                            AND EXISTS (SELECT      * FROM ProductCustomers)
                            AND ic.CustomerID = pc.CustomerID THEN
                            ic.CustomerID
                    END
    FROM    IndustryCustomers   AS ic
    FULL JOIN ProductCustomers AS pc
        ON pc.CustomerID = ic.CustomerID
) AS x
WHERE   x.CustomerID IS NOT NULL;

CustomerID
1
2
3

`IndustryCustomers` contains no data

WITH
    IndustryCustomers AS (
        SELECT CustomerID = NULL
        WHERE 1 = 2
    ),
    ProductCustomers AS (
        SELECT
            ProductCustomers.CustomerID
        FROM ( VALUES (2), (3), (4)) AS ProductCustomers (CustomerID)
    )
SELECT
    *
FROM    (
    SELECT
        CustomerID  = CASE
                        WHEN NOT EXISTS (SELECT     * FROM ProductCustomers) THEN
                            ic.CustomerID
                        WHEN NOT EXISTS (SELECT     * FROM IndustryCustomers) THEN
                            pc.CustomerID
                        WHEN EXISTS (SELECT     * FROM IndustryCustomers)
                            AND EXISTS (SELECT      * FROM ProductCustomers)
                            AND ic.CustomerID = pc.CustomerID THEN
                            ic.CustomerID
                    END
    FROM    IndustryCustomers   AS ic
    FULL JOIN ProductCustomers AS pc
        ON pc.CustomerID = ic.CustomerID
) AS x
WHERE   x.CustomerID IS NOT NULL;

CustomerID
2
3
4

Extra notes

When using EXISTS clause always use the form SELECT * FROM .... Not only is the intent of the code more clear, but there are no performance differences between using *, 1, TOP 1 1, or Column1, ..., Column327. SQL Server stops executing the query as soon as it find a single result and never even considers the TOP. If you compare them you'll see that the execution plans are all identical.

EXISTS (SELECT 1...) vs EXISTS (SELECT TOP 1...) Does it matter?

Test with 10,000 records in each table, and only half of them overlap

SET STATISTICS IO, TIME ON
DECLARE
    @IndustryStartID    int = 1,
    @IndustryEndID      int = 10,
    @ProductStartID     int = 5,
    @ProductEndID       int = 15;


WITH
    IndustryCustomers AS (
        SELECT CustomerID = @IndustryStartID
        UNION ALL
        SELECT
            ic.CustomerID + 1
        FROM    IndustryCustomers AS ic
        WHERE   ic.CustomerID + 1 <= @IndustryEndID
    ),
    ProductCustomers AS (
        SELECT CustomerID = @ProductStartID
        UNION ALL
        SELECT
            pc.CustomerID + 1
        FROM    ProductCustomers AS pc
        WHERE   pc.CustomerID + 1 <= @ProductEndID
    )
SELECT
    *
FROM    (
    SELECT
        CustomerID  = CASE
                        WHEN NOT EXISTS (SELECT     * FROM ProductCustomers) THEN
                            ic.CustomerID
                        WHEN NOT EXISTS (SELECT     * FROM IndustryCustomers) THEN
                            pc.CustomerID
                        WHEN EXISTS (SELECT     * FROM IndustryCustomers)
                            AND EXISTS (SELECT      * FROM ProductCustomers)
                            AND ic.CustomerID = pc.CustomerID THEN
                            ic.CustomerID
                    END
    FROM    IndustryCustomers   AS ic
    FULL JOIN ProductCustomers AS pc
        ON pc.CustomerID = ic.CustomerID
) AS x
WHERE   x.CustomerID IS NOT NULL
OPTION (MAXRECURSION 10000);

SET STATISTICS IO, TIME OFF

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Stephan
Solution 2	Heinzi
Solution 3