'Can I use the ALL operator to test that all values of a "group" exist in another query/set? If so, how?
- I have this database for a CS/database-theory homework question for a hypothetical movie store company:
- For those who might be unfamiliar with the concept, a movie store is a retail location where patrons can film productions on VHS tape, or this newfangled format called "DVD".
- Who knows, maybe some time in the distant, far-off, future we might be able to view movies directly over the Internet?
- The DDL and sample data is below.
- For those who might be unfamiliar with the concept, a movie store is a retail location where patrons can film productions on VHS tape, or this newfangled format called "DVD".
- I need to write a query that will show all movies that are available in all three Chicago stores: (
WI01,WI02, andWI03).- By looking at the raw data below ourselves, we can see that only these 3
movieIdvalues (D00001,D00006, andD00007) havemovie_storerows for everystorelocated in Chicago.
- By looking at the raw data below ourselves, we can see that only these 3
CREATE TABLE movie (
movieId varchar(6) NOT NULL PRIMARY KEY,
title nvarchar(50) NOT NULL
);
CREATE TABLE store (
storeId varchar(4) NOT NULL PRIMARY KEY,
city nvarchar(20) NOT NULL
);
CREATE TABLE movie_store (
movieid varchar(6) FOREIGN KEY REFERENCES movie ( movieId ),
storeid varchar(4) FOREIGN KEY REFERENCES store ( storeId ),
PRIMARY KEY ( movieId, storeId )
);
GO
INSERT INTO movie ( movieId, title )
VALUES
('D00001', N'True Lies'),
('D00002', N'Predator'),
('D00003', N'Last Action Hero'),
('D00004', N'Red Heat'),
('D00005', N'Conan 1'),
('D00006', N'Conan 2'),
('D00007', N'Red Sonja');
INSERT INTO store ( storeId, city ) VALUES
('WI01', N'Chicago'),
('WI02', N'Chicago'),
('WI03', N'Chicago'),
('IL01', N'Atlanta'),
('IL02', N'Nashville');
INSERT INTO movie_store ( movieId, storeId ) VALUES
-- True Lies:
('D00001', 'WI01'),
('D00001', 'WI02'),
('D00001', 'WI03'),
-- 'Predator:
('D00002', 'IL01'),
('D00002', 'IL02'),
-- Last Action Hero:
('D00003', 'WI01'),
-- Red Heat:
('D00004', 'WI01'),
('D00004', 'WI02'),
('D00004', 'IL02'),
-- Conan 1:
('D00005', 'WI01'),
('D00005', 'WI02'),
-- Conan 2:
('D00006', 'WI01'),
('D00006', 'WI02'),
('D00006', 'WI03'),
-- Red Sonja:
('D00007', 'WI01'),
('D00007', 'WI02'),
('D00007', 'WI03');
- During my problem-solving research I found a site explaining the
ALLoperator.- My query is getting unique
storeIdsfor Chicago. - It is then trying to get the movie
titlewith astoreIdrecord for each one of the Chicago locations.
- My query is getting unique
WITH chicagoStores AS (
SELECT DISTINCT
storeId
FROM
store
WHERE
city = 'Chicago'
)
SELECT DISTINCT
m.title
FROM
movie AS m
INNER JOIN movie_store AS y ON m.movieid = y.movieid
INNER JOIN store AS s ON y.storeid = s.storeid
WHERE
s.storeId = ALL( SELECT storeId FROM chicagoStores )
But my query returns zero rows (and no errors), am I misunderstanding the ALL operator?
Solution 1:[1]
Try this
select city,title,count(*)
from
store
inner join movie_store
on store.storeid = movie_store.storeid
inner join movie
on movie_store.movieid = movie.movieid
where city = 'Chicago'
group by city,title
having count(*) = 3
Solution 2:[2]
It appears I had the wrong idea about ALL. I realized I could write my query this way to get the movies that appeared in all Chicago locations. Thanks for your help everyone.
with stores as (
select count(distinct(storeid)) as count_store
from store
where city = 'Chicago'
),
count_movies as (
select z.title, count(*) as count
from movie z
join movie_store y on (z.movieid = y.movieid)
join store x on (y.storeid = x.storeid)
where x.city = 'Chicago'
group by z.title
having count(*) = (select count_store from stores)
)
select title from count_movies
Solution 3:[3]
This can be done without a window and without mentioning chicago 2 times
select m.title,
s.city
from store s
inner join movie_store ms on s.storeid = ms.storeid
inner join movie m on ms.movieid = m.movieid
cross apply (select count(1) numbers from store s2 where s2.city = s.city group by s2.city) c
where s.city = 'Chicago'
group by m.title, s.city, c.numbers
having count(s.storeid) = c.numbers
Try it yourself at this DBFiddle
Solution 4:[4]
After plugging-away for a bit more, and reading this article on Relational Division, I think I've found a much shorter query compared to my original answer:
SELECT
m.movieId,
m.title
FROM
movie AS m
WHERE
NOT EXISTS (
SELECT m.movieId, s.storeId FROM store AS s WHERE city = 'Chicago'
EXCEPT
SELECT r.movieId, r.storeId FROM movie_store AS r WHERE r.movieId = m.movieId
);
The explanation below is based on this slightly longer-form version which is otherwise identical in its relational-calculus:
WITH chicagoStores AS (
SELECT storeId FROM store WHERE city = 'Chicago'
)
SELECT
m.movieId,
m.title
FROM
movie AS m
WHERE
NOT EXISTS (
-- Generate rows we'd *expect to exist* if a given `m.movieId` is present in every Chicago storeId:
SELECT
m.movieId,
s.storeId
FROM
chicagoStores AS s
-- Then subtract the `movie_store` rows that *actually exist* for this m.movieId:
EXCEPT
SELECT
a.movieId,
a.storeId
FROM
movie_store AS a
WHERE
a.movieId = m.movieId
);
- Generate the same
chicagoStoresset. - Then, filter
movieIdvalues inmovie(inWHERE) by...- Generate a set of hypothetical
movieId, storeIdtuples (for that specificmovieIdvalue) fromchicagoStores... - ...and subtract that from the actual rows in
movie_store(usingEXCEPT)...- This is the same thing as
whatIfEveryChicagoStoreHadEveryMoviefrom my original answer, but computed on a a per-row basis, instead of (conceptually) generating the( movies X stores ) EXCEPT movie_storeCartesian Product subtraction all-at-once.- ...with the implication that this might require less maximum total memory on the database-server, but that's entirely dependent on the execution-plan and DB engine.
- This is the same thing as
- ...and if there are any hypothetical rows remaining (using
EXISTS) after theEXCEPTthen it means that themovieIdis not available in all Chicagoan stores. - So if we invert the predicate (
NOT EXISTS) then that means we can test to see if a specificmovieIdvalue is available in all Chicagoan stores.- But be careful as it would also have false-posities when/if the
chicagoStorescollection was empty (asNOT EXISTS ( x EXCEPT y )is alwaystruewhenxis empty, even ifyis also empty.- As a workaround, change
FROM movietoFROM anyChicagoStoreMovieIds(whereanyChicagoStoreMovieIdsis defined in my other answer).
- As a workaround, change
- But be careful as it would also have false-posities when/if the
- Generate a set of hypothetical
Solution 5:[5]
The
x = ALL yoperator isn't what you want.- It sounds like you want an operator to test that "all values in x are also in y", aka an "
ALL IN" operator.- Unfortunately
ALL INdoes not exist in SQL, despite its obvious utility.
- Unfortunately
- The
x = ALL yoperator actually tests to see if all values iny(the right-hand-side single-column table-expression) equal the single scalar valuex.- This functionality is not relevant to the problem as we don't need to test that some result-list equals some single row's column.
- The other operators aren't of much use either (e.g.
x != ALL yorx < ALL y).
- It sounds like you want an operator to test that "all values in x are also in y", aka an "
While there is no
ALL INoperator in SQL, the concept exists in relational-algebra where it's known as relational division.- Conceptually (and if it existed in SQL) then
x DIVIDE y ON y.foo = x.foowould by likeGROUP BY x.foo HAVING x.foo ALL IN ( SELECT foo FROM y ) UNGROUP(of course,UNGROUPisn't a thing either). - It's a common PITA to implement relational-division in SQL.
- Conceptually (and if it existed in SQL) then
From a high-level, the problem can be broken-down into 4 steps:
- Get the set of
storeIdvalues for Stores in Chicago.- i.e.
SELECT storeId FROM store WHERE city = 'Chicago'
- i.e.
- Separately, get the set of
movieIdvalues for all movies in those stores.- i.e.
SELECT * FROM movie_store AS ms INNER JOIN chicagoStores AS cs ON ms.storeId = cs.storeId
- i.e.
- Then group/partition the
chicagoMoviesset by their separatemovieIdvalues. - Then filter out those groups/partitions where each partition's set of
storeIdvalues does not equal thechicagoStoresset.
But here's the hard-part: SQL does not offer a way to evalaute a predicate condition for each partition in a GROUP BY query.
SQL is more like Relational Calculus, where you describe what you want, as opposed to Relational Algebra, where you describe how you want it done. Linq is an example of a relational-algebra query language in comparison to SQL.
In Linq (for in-memory objects, not Entity Framework), you would do it like this:
HashSet<StoreId> chicagoStores = Stores
.Where( s => s.City == "Chicago" )
.Select( s => s.StoreId )
.ToHashSet();
MovieStores
.GroupBy( ms => ms.StoreId )
.Where( grp => grp.All( ms => chicagoStores.Contains( ms.StoreId ) ) )
.SelectMany( grp => grp )
.Select( ms => ms.Movie )
So I went for a completely different approach in SQL:
Get the set of
storeIdvalues for Stores in Chicago:WITH chicagoStores AS ( SELECT storeId FROM store WHERE city = 'Chicago' ) StoreId ------- WI01 WI02 WI03Get the set of
movieIdvalues for movies that are in at least 1 Chicago store.WITH anyChicagoStoreMovieIds AS ( SELECT DISTINCT ms.movieId FROM movie_store AS ms INNER JOIN chicagoStores AS cs ON cs.storeId = ms.storeId ) movieId ------- D00001 D00003 D00004 D00005 D00006 D00007- Alternatively, the set of all
movieIdvalues inmoviecould be used, but doing that would make Step 3 potentially much slower.
- Alternatively, the set of all
Generate the
CROSS APPLYof Step 1 and Step 2, which generates the Cartesian Product of every ChicagoanstoreIdwith everymovieId.- Hence why restricting it to a the smaller upper-bound set from Step 2 makes sense, as there's no point including
movieIdvalues that don't appear in any Chicago store.
WITH whatIfEveryChicagoStoreHadEveryMovie AS ( SELECT m.movieId, cs.movieId FROM chicagoStores AS cs CROSS APPLY anyChicagoStoreMovieIds AS m ) movieId storeId ------- D00001 WI01 D00001 WI02 D00001 WI03 D00003 WI01 D00003 WI02 D00003 WI03 D00004 WI01 D00004 WI02 D00004 WI03 D00005 WI01 D00005 WI02 D00005 WI03 D00006 WI01 D00006 WI02 D00006 WI03 D00007 WI01 D00007 WI02 D00007 WI03- Hence why restricting it to a the smaller upper-bound set from Step 2 makes sense, as there's no point including
Now the hard part:
- Consider that if any given Chicago
storeIdhad everymovieIdpossible, then such a row would already-exist inmovie_store... - ...therefore it follows that if we then subtract actual rows in
movie_storefrom Step 3's result, then we'll be left with the set of( movieId, storeId )tuples that don't exist but which would need to exist in order for every ChicagostoreIdto have that movie.
WITH chicagoanMoviesNotAvailableAtEveryChicagoanStore AS ( SELECT w.movieId, w.storeId FROM whatIfEveryChicagoStoreHadEveryMovie AS w LEFT OUTER JOIN movie_store AS ms ON w.movieId = ms.movieId AND w.storeId = ms.storeId WHERE ms.storeId IS NULL ) movieId storeId ---------------- D00003 WI02 D00003 WI03 D00004 WI03 D00005 WI03- Consider that if any given Chicago
Then it's just a matter of subtracting
chicagoanMoviesNotAvailableAtEveryChicagoanStorefromanyChicagoStoreMovieIds(from Step 3), which gives us the set ofmovieIdvalues that are available at every ChicagoanstoreId:WITH moviesNotIn_moviesNotInAtLeast1ChicagoStore AS ( SELECT movieId FROM anyChicagoStoreMovieIds EXCEPT SELECT movieId FROM moviesNotInAtLeast1ChicagoStore ) movieId storeId ---------------- D00003 WI02 D00003 WI03 D00004 WI03 D00005 WI03Which can then be
INNER JOINed withmovieto get theirtitleinformation, etc:SELECT m.movieId, m.title FROM moviesNotIn_moviesNotInAtLeast1ChicagoStore AS ffs INNER JOIN movie AS m ON ffs.movieId = m.movieId; movieId title ---------------- D00001 'True Lies' D00006 'Conan 2' D00007 'Red Sonja'
Thus giving the full final query:
WITH
chicagoStores AS (
SELECT storeId FROM store WHERE city = 'Chicago'
),
anyChicagoStoreMovieIds AS (
SELECT DISTINCT
ms.movieId
FROM
movie_store AS ms
INNER JOIN chicagoStores AS cs ON cs.storeId = ms.storeId
),
expectedMovieStores AS (
SELECT
m.movieId,
cs.storeId
FROM
chicagoStores AS cs
CROSS JOIN anyChicagoStoreMovieIds AS m
),
moviesNotInAtLeast1ChicagoStore AS (
SELECT
e.*
FROM
expectedMovieStores AS e
LEFT OUTER JOIN movie_store AS ms ON
e.movieId = ms.movieId
AND
e.storeId = ms.storeId
WHERE
ms.storeId IS NULL
),
moviesNotIn_moviesNotInAtLeast1ChicagoStore AS (
SELECT
movieId
FROM
anyChicagoStoreMovieIds
EXCEPT
SELECT
movieId
FROM
moviesNotInAtLeast1ChicagoStore
)
SELECT
m.movieId,
m.title
FROM
moviesNotIn_moviesNotInAtLeast1ChicagoStore AS ffs
INNER JOIN movie AS m ON
ffs.movieId = m.movieId;
The CTEs that are only used once can be inlined to shorten the query somewhat:
WITH
chicagoStores AS (
SELECT storeId FROM store WHERE city = 'Chicago'
),
anyChicagoStoreMovieIds AS (
SELECT DISTINCT
ms.movieId
FROM
movie_store AS ms
INNER JOIN chicagoStores AS cs ON cs.storeId = ms.storeId
),
expectedMovieStores AS (
SELECT
m.movieId,
cs.storeId
FROM
chicagoStores AS cs
CROSS JOIN anyChicagoStoreMovieIds AS m
)
SELECT
m.movieId,
m.title
FROM
(
SELECT
movieId
FROM
anyChicagoStoreMovieIds
EXCEPT
SELECT
e.movieId
FROM
expectedMovieStores AS e
LEFT OUTER JOIN movie_store AS ms ON
e.movieId = ms.movieId
AND
e.storeId = ms.storeId
WHERE
ms.storeId IS NULL
) AS ffs
INNER JOIN movie AS m ON
ffs.movieId = m.movieId;
...which is stioll rather long and complex for what's described as a CS/SQL homework - this took me over 2 hours to figure out because it was driving me mad.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | RegBes |
| Solution 2 | user3691608 |
| Solution 3 | GuidoG |
| Solution 4 | |
| Solution 5 |
