'Question about WHERE EXISTS and subquery on MySQL Coursera page

Edited again to add: the 2nd table ("dogs") have null user_guid values. If i filter out the NULL values then the NOT IN operator works without WHERE. And I still dont know why :|


Edited: Sorry I realized that I flipped the question. The question was the number of users who are in "users" table and NOT in the "dogs" table. My original code was:

select count(distinct u.user_guid)
from users u
where u.user_guid not IN (select d.user_guid
                         from dogs d)

Which got result of 0 count. The answer was supposed to be 2226 Only when I add the WHEN condition then i got the right answer:

select count(distinct u.user_guid)
from users u
where u.user_guid not IN (select d.user_guid
                         from dogs d
                         WHERE d.user_guid = u.user_guid);

I still don't know why the WHERE condition makes the difference here


The question was:

How could you determine the number of unique users in the "users" table who were also in the "dogs" table?

The correct answer was supposed to be

select count(u.user_guid)
from users u
where u.user_guid IN (select d.user_guid
                         from dogs d
                         where u.user_guid=d.user_guid);

Now I don't understand the point of the last WHERE filter. I thought the "IN" operator would have already made sure to filter only for values of u.user_guid that also exists in the d.user_guid (ie. the "IN" operator would have ensured that there are matching values) , which is the same as what the WHERE condition did (ie. u.user_guid = d.user_guid). And yet when I run the 2 codes (with and without the last WHERE line), the results were different. I'm not sure where I went wrong :/



Solution 1:[1]

Below query finds the users who exists and are not unique on the dogs table , and based on the users found on the dogs table , only the unique values are returned from the users table:

select u.user_guid
from users u
where exists ( select d.user_guid
                         from dogs d
                         where u.user_guid=d.user_guid
             )
group by user_guid
having count(*)=1;

The last where condition where u.user_guid=d.user_guid returns only the users from dogs table which are found on the users table, so it should produce wrong results in your case.

If you want only the number change u.user_guid with count(u.user_guid)

select count(u.user_guid)
from users u
where exists ( select d.user_guid
                         from dogs d
                         where u.user_guid=d.user_guid
             )
group by user_guid
having count(*)=1;

Solution 2:[2]

Hey guys I think you're overcomplicating things here.
If you do a simple join you only get the values which are in both tables. If you do a left join the fields which aren't matched in the second table return null.
NB: if we join on a column which is nullable we could get false results but that's not good in any case.

create table dogs (d int);
create table user (u int);
insert into dogs values
(1),(2),(3),(4),
(1),(2),(5),(6);
insert into user values
(1),(2),(3),(4),
(1),(2),(7),(8);
SELECT COUNT(DISTINCT d) as du 
FROM dogs join user on d=u;
| du |
| -: |
|  4 |
SELECT COUNT(DISTINCT d) as dd
FROM dogs LEFT JOIN user on d=u
WHERE u IS NULL;
| dd |
| -: |
|  2 |

db<>fiddle here

Solution 3:[3]

If you want to select unique users which are not available in dogs table then it's better to use not exists.

     select count(distinct u.user_guid) unique_users    
     from users u     
     where not exists (select 1 from dogs d where u.user_guid = d.user_guid);

not exists and not in are almost do the same job but they handled null in different ways. Below link will explain that

NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

If you want to count unique users you might need distinct keyword if there are multiple rows for any user. For example: count( distinct u.user_guid).

For IN you don't need where u.user_guid=d.user_guid. But I would suggest the solution usingexists

UsingIN:

select count( u.user_guid)
from users u
where u.user_guid IN (select d.user_guid
                         from dogs d);

Using Exists:

select count( u.user_guid)
from users u
where exists IN (select 1
                         from dogs d
                          where u.user_guid=d.user_guid);

Examples of IN and exists In users table there are three value for user_guid. 1,2 and 3. 2 is there two times. But in dogs table user_guid are 2,3 and 5. 2 and 3 are common in both tables. Please check the queries and outputs.

Schema and insert statements:

 create table users(user_guid int);
 create table dogs(user_guid int);

 insert into users values(1);
 insert into users values(2);
 insert into users values(2);
 insert into users values(3);

 insert into dogs values(2);
 insert into dogs values(3);
 insert into dogs values(5);

Query: Using IN select count(distinct u.user_guid) unique_users
from users u
where u.user_guid IN (select d.user_guid from dogs d );

Output:

unique_users
2

Query: Using EXISTS

 select count(distinct u.user_guid) unique_users    
 from users u     
 where exists (select 1 from dogs d where u.user_guid = d.user_guid);

Output:

unique_users
2

db<>fiddle here

Solution 4:[4]

ok so it's weird to answer my own question but I found out the reason why IN worked without the WHERE condition yet NOT IN absolutely needed the WHERE condition (in the case of my queries). It's because the "dogs" dataset has a few NULL user_guid values.

Eg. fiddle here

CREATE TABLE users (
  id INT
);
INSERT INTO users (id) VALUES (1);
INSERT INTO users (id) VALUES (2);
INSERT INTO users (id) VALUES (3);

CREATE TABLE dogs (
  id INT
);
INSERT INTO dogs (id) VALUES (3);
INSERT INTO dogs (id) VALUES (4);
INSERT INTO dogs (id) VALUES (NULL);

If I use IN without WHERE:

select count(distinct u.id)
from users u
where u.id in (select d.id
              from dogs d
                        );

the result is 1 (the value 3), because SQL just need to find the match for "3" once, no need to care about all other values including the NULL value. But if I use NOT IN without the last WHERE, instead of giving me result of 2 (ID 1 and 2), the result was 0.

select count(distinct u.id)
from users u
where u.id not in (select d.id
                    from dogs d);

This is because SQL had to make sure the IDs 1 and 2 in users table didn't match with any of the rows in dogs table, meaning it had to compare ALL rows including the NULL rows. And NULL value is considered as unknown hence it can't be compared to a non-NULL value. So I actually can just use WHERE d.user_guid is not null as well, it would also return the same number.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ergest Basha
Solution 2
Solution 3
Solution 4