'Efficiently inserting non-duplicate variations of rows into the same table (EXCEPT / NOT EXISTS?)

I have a process which generates a number of potentially asymmetric outcomes (record A links to record B, but record B may not link to record A). Each of these outcomes are stored in a table and I want to insert into that same table all the missing links (i.e. generate a row for every case where record B links to record A) - without generating duplicates.

From looking around it seems that NOT EXISTS is the preferred method for this. But as this is an INSERT into the same table, I wanted to see if anyone had ideas for a more efficient approach (table size will vary from ~50,000 to ~20,000,000).

INSERT INTO [table1]
([record_id], [linked_record_id], [flag_value])
SELECT
,[linked_record_id] AS [record_id]
,[record_id]        AS [linked_record_id]
,[flag_value]
FROM [table1] AS A
WHERE [flag_value] = 1
AND NOT EXISTS (
    SELECT
    [record_id]
    ,[linked_record_id]
    ,[flag_value]
    FROM [table1] AS B
    WHERE [flag_value] = 1
    AND A.[linked_record_id] = B.[record_id]
    )


Solution 1:[1]

  1. Using NOT IN.
  2. Using NOT EXISTS.
  3. LEFT JOIN WHERE IS NULL

With these three ways you can insert records by avoiding duplicate rows.

if your fields are properly indexed, OR if you expect to filter out more records (i.e., have lots of rows EXIST in the subquery) NOT EXISTS will perform better. NOT EXISTS and NOT IN predicates are the best way to search for missing values, as long as both columns in question are NOT NULL.

LEFT JOIN / IS NULL is less efficient, since it makes no attempt to skip the already matched values in the right table, returning all results and filtering them out instead.

For more information refer this Document

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 PratikLad-MT