'Inner join of an EXCEPT subquery on column names that do not appear in the output
I have a table of students and classes. I would like to find what classes were dropped from one semester to another (and a similar query for classes added).
Student Class Semester
==============================
Alice English 11
Alice Geometry 11
Alice English 12
Bob Spanish 11
Bob Spanish 12
My approach is to use an except
(same as minus
):
select distinct Class
from table
where table.Student = 'Alice'
and table.Semester = 11
except
select distinct Class
from table
where table.Student = 'Alice'
and table.Semester = 12
This works correctly, returning Geometry
. However, I need to use this as a subquery like this:
select Student, string_agg(X.Class, ', ') as 'Deleted_Classes',
count(X) as 'Num_deleted',
SemesterTable.Semester as semester,
lag(Semester, 1)
over (partition by StudentTable.Student
order by SemesterTable.Semester) as Prev_Semester,
from
StudentTable
SemesterTable
inner join (
<<<Same query from above>>>
) X on _______
where X.Num_deleted > 0
My problems is with the ____
section - inner joins can only be joined on columns that appear in the output. But my except
query doesn't return the values of previous and current semesters (it might even return nothing at all if no classes were dropped). So how do I join the subquery into the main table? My desired output is:
Student Semester Prev Semester Deleted_Classes
========================================================
Alice 12 11 Geometry
Alice
appears because she had a change in her schedule, but Bob
is omitted because there was no change in his schedule.
Solution 1:[1]
I would do this via Left Join
with checking the availability of the next semester for a particular student via exist
in where
.
Select T.Student, T.Semester+1 As Semester, T.Semester As [Prev Semester],
string_agg(T.Class, ',') As Deleted_Classes
From Tbl As T Left Join Tbl As T1 On (T.Student=T1.Student
And T.Semester+1=T1.Semester
And T.Class=T1.Class)
Where Exists (Select * From Tbl
Where Student=T.Student
And Semester=T.Semester+1) And
T1.Semester Is Null
Group by T.Student, T.Semester+1, T.Semester
If your semester ids are not increasing strictly, you can use the same logic via cte
with dense_rank
to order semesters according to your criteria for each student as follows:
With CTE As (
Select Student, Semester, Class,
Dense_Rank() Over (Partition by Student Order by Semester) As N
From Tbl
)
Select T.Student, Max(T2.Semester) As Semester, Max(T.Semester) As [Prev Semester],
string_agg(T.Class, ',') As Deleted_Classes
From CTE As T Left Join CTE As T1 On (T.Student=T1.Student
And T.N+1=T1.N
And T.Class=T1.Class)
Cross Apply (Select Distinct Semester
From CTE
Where Student=T.Student
And N=T.N+1) As T2
Where T1.N Is Null
Group by T.Student, T.N+1, T.N
Result:
Student | Semester | Prev Semester | Deleted_Classes |
---|---|---|---|
Alice | 12 | 11 | Geometry |
to get both deleted classes and added classes with a single query, you can use the following:
With CTE As (
Select Student, Semester, Class,
Dense_Rank() Over (Partition by Student Order by Semester) As N
From Tbl
)
Select T.Student, Max(T1.Semester) As Semester, Max(T.Semester) As [Prev Semester],
Max(T2.Deleted_Classes) As Deleted_Classes, Max(T3.Added_Classes) As Added_Classes
From (Select Distinct Student, Semester, N From CTE) As T Cross Apply
(Select Distinct Student, Semester, N From CTE Where Student=T.Student And N=T.N+1) As T1
Outer Apply (Select Student, Semester, string_agg(Class, ', ') As Deleted_Classes
From CTE
Where Student=T.Student
And N=T.N
And Class Not In (Select Class From CTE Where Student=T.Student And N=T.N+1)
Group by Student, Semester) As T2
Outer Apply (Select Student, Semester, string_agg(Class, ', ') As Added_Classes
From CTE
Where Student=T.Student
And N=T.N+1
And Class Not In (Select Class From CTE Where Student=T.Student And N=T.N)
Group by Student, Semester) As T3
Group by T.Student, T.N+1, T.N
Having Max(T2.Deleted_Classes) Is Not Null Or Max(T3.Added_Classes) Is Not Null
db<>fiddle here
Result:
Student | Semester | Prev Semester | Deleted_Classes | Added_Classes |
---|---|---|---|---|
Alice | 11 | 10 | Portuguese | English, Math |
Alice | 12 | 11 | Geometry, Math | Biology, Geography |
Bob | 12 | 11 | Portuguese | Math |
Solution 2:[2]
with data as (
select *,
min(Semester) over (partition by Student, Class) as minSemester,
max(Semester) over (partition by Student, Class) as maxSemester,
count(*) over (partition by Student, Class) as cntSemester
from T
where Semester in (11, 12)
)
select Student, Class,
case when minSemester = 12 then 'Added' else '' end as Added,
case when maxSemester = 11 then 'Dropped' else '' end as Dropped
from data
where maxSemester = 11;
There's all kinds of information you could derive from these values. For example, knowing that the latest semester was not 12 implies that class was dropped. You can do similar for additions.
Solution 3:[3]
Using a NOT EXISTS
seems appropriate for this.
create table StudentSemesters ( Student varchar(30), Class varchar(30), Semester int ); insert into StudentSemesters (Student, Class, Semester) values ('Alice', 'English', 11) , ('Alice', 'Geometry', 11) , ('Alice', 'English', 12) , ('Bob', 'Spanish', 11) , ('Bob', 'Spanish', 12) ;
select Student , Semester+1 as [Semester] , Semester as [Prev_Semester] , STRING_AGG(Class, ', ') as [Deleted_Classes] from StudentSemesters t where not exists ( select 1 from StudentSemesters t2 where t2.Student = t.Student and t2.Class = t.Class and t2.Semester = t.Semester+1 ) and exists ( select 1 from StudentSemesters t2 where t2.Student = t.Student and t2.Semester = t.Semester+1 ) group by Student, Semester;
Student | Semester | Prev_Semester | Deleted_Classes |
---|---|---|---|
Alice | 12 | 11 | Geometry |
select Student , Semester as [Semester] , Semester-1 as [Prev_Semester] , STRING_AGG(Class, ', ') as [Added_Classes] from StudentSemesters t where not exists ( select 1 from StudentSemesters t2 where t2.Student = t.Student and t2.Class = t.Class and t2.Semester = t.Semester-1 ) group by Student, Semester;
Student | Semester | Prev_Semester | Added_Classes |
---|---|---|---|
Alice | 11 | 10 | English, Geometry |
Bob | 11 | 10 | Spanish |
Combined, using full join
select coalesce(t1.Student, t2.Student) as Student , coalesce(t2.Semester, t1.Semester+1) as [Semester] , coalesce(t1.Semester, t2.Semester-1) as [Prev_Semester] , STRING_AGG(t1.Class, ', ') as [Deleted_Classes] , STRING_AGG(t2.Class, ', ') as [Added_Classes] from StudentSemesters t1 full join StudentSemesters t2 on t2.Student = t1.Student and t2.Class = t1.Class and t2.Semester = t1.Semester+1 where (t1.Class is null or (t2.Class is null and exists ( select 1 from StudentSemesters t3 where t3.Student = t1.Student and t3.Semester = t1.Semester+1 ))) group by coalesce(t1.Student, t2.Student), coalesce(t2.Semester, t1.Semester+1), coalesce(t1.Semester, t2.Semester-1) order by Student, Semester;
Student | Semester | Prev_Semester | Deleted_Classes | Added_Classes |
---|---|---|---|---|
Alice | 11 | 10 | null | English, Geometry |
Alice | 12 | 11 | Geometry | null |
Bob | 11 | 10 | null | Spanish |
Demo on db<>fiddle here
Solution 4:[4]
Group rows ordered by Student and Semester according to Student, and in each group, group rows by Semester and perform inter-row calculations. The difference between the set of Classes in the spring semester and that in the fall semester is the newly added classes, and the difference between the fall semester and the spring semester contains the cancelled classes. It is a hassle to code this in SQL because you will need the window function, CROSSAPPLY and OUTERAPPLY. The SQL statement will be lengthy and difficult to understand. An alternative is to move data out of database and process it in Python or SPL. SPL, the open-source Java package, is easy to be integrated into a Java program and generate simple code. It can get this done with only two lines of code
A | |
---|---|
1 | =MSSQL.query@x("select * from Classes order by 1,3") |
2 | =A1.group@o(#1).conj(~.group@o(#1,#3;~.(#2)).new(Student,#2[+1]:Semester,#2:Prev_Semester,(#3#3[+1]).concat@c():Deleted_Classes,(#3[+1]#3).concat@c():Added_Classes).m(:-2)) |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | |
Solution 4 |