'How to Include/Exclude array of IDs from a relationship/pivot table and avoid duplicates?

Let's say you have

  • records table with id and name
  • tags table with id and name
  • records_tags with record_id and tags_id (relationship table)

Now you want to run a query to include records that have X tags and exclude records that have X tags.

You could do INNER JOIN, but the challenge here is, when there are many tags to a record, it creates duplicates within the results.

Example:

inner join `records_tags` on `records_tags`.`record_id` = `records`.`id` 
and `records_tags`.`tag_id` in (?) and `records_tags`.`tag_id` not in (?)

As for the Laravel side, Ive used:

$records->join('records_tags', function ($join) use($include, $exclude) {
   $join->on('records_tags.record_id','=','records.id');
   if ($include) $join->whereIn('records_tags.tag_id',$include);
   if ($exclude) $join->whereNotIn('records_tags.tag_id',$exclude);
});

Could there be a better solution to handle this or a way to ask for it to create unique or distinct rows, the goal of the join is only to include or exclude the actual records themselves from the results?

Edit:

The only other thing I can think of is doing something like this, still have to run tests to see accuracy, but for a crude solution

Edit 2: This doesn't appear to work on NOT IN as it creates duplicates.

$records->join(\DB::raw('(SELECT tag_id, record_id FROM records_tags WHERE records_tags.tag_id IN ('.implode(',',$include).'))'),'records_tags.record_id','=','records.id');


Solution 1:[1]

The conditions in the ON clause:

... and `records_tags`.`tag_id` in (?) and `records_tags`.`tag_id` not in (?)

do not exclude from the results the ids of the records that you want to exclude.
Any id that is linked to any of the wanted tags will be returned even if it is also linked to an unwanted tag, because the joins return 1 row for each of the linked tags.

What you can use is aggregation and the conditions in the HAVING clause:

SELECT r.id, r.name
FROM records r INNER JOIN records_tags rt 
ON rt.record_id = r.id
GROUP BY r.id -- I assume that id is the primary key of records
HAVING SUM(rt.tag_id IN (?)) > 0
   AND SUM(rt.tag_id IN (?)) = 0; 

or, if you want the ids that are linked to all the wanted tags, use GROUP_CONCAT():

SELECT r.id, r.name
FROM records r INNER JOIN records_tags rt 
ON rt.record_id = r.id
GROUP BY r.id 
HAVING GROUP_CONCAT(rt.tag_id ORDER BY rt.tag_id) = ?; 

In this case you will have to provide for the wanted tags ? placeholder a sorted comma separated list of ids.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1