'Inner join of an EXCEPT subquery on column names that do not appear in the output

I have a table of students and classes. I would like to find what classes were dropped from one semester to another (and a similar query for classes added).

Student    Class       Semester
==============================
Alice      English     11
Alice      Geometry    11
Alice      English     12
Bob        Spanish     11
Bob        Spanish     12

My approach is to use an except (same as minus):

select distinct Class
from table
where table.Student = 'Alice'
and table.Semester = 11
except
select distinct Class
from table
where table.Student = 'Alice'
and table.Semester = 12

This works correctly, returning Geometry. However, I need to use this as a subquery like this:

select Student, string_agg(X.Class, ', ') as 'Deleted_Classes', 
count(X) as 'Num_deleted',
SemesterTable.Semester as semester, 
lag(Semester, 1) 
over (partition by StudentTable.Student 
order by SemesterTable.Semester) as Prev_Semester,
from 
StudentTable
SemesterTable
inner join (
<<<Same query from above>>>
) X on _______
where X.Num_deleted > 0

My problems is with the ____ section - inner joins can only be joined on columns that appear in the output. But my except query doesn't return the values of previous and current semesters (it might even return nothing at all if no classes were dropped). So how do I join the subquery into the main table? My desired output is:

Student     Semester     Prev Semester   Deleted_Classes
========================================================
Alice        12          11              Geometry

Alice appears because she had a change in her schedule, but Bob is omitted because there was no change in his schedule.



Solution 1:[1]

I would do this via Left Join with checking the availability of the next semester for a particular student via exist in where.

Select T.Student, T.Semester+1 As Semester, T.Semester As [Prev Semester], 
       string_agg(T.Class, ',') As Deleted_Classes
From Tbl As T Left Join Tbl As T1 On (T.Student=T1.Student 
                                      And T.Semester+1=T1.Semester
                                      And T.Class=T1.Class)
Where Exists (Select * From Tbl 
              Where Student=T.Student 
                    And Semester=T.Semester+1) And
      T1.Semester Is Null
Group by T.Student, T.Semester+1, T.Semester

If your semester ids are not increasing strictly, you can use the same logic via cte with dense_rank to order semesters according to your criteria for each student as follows:

With CTE As (
Select Student, Semester, Class, 
       Dense_Rank() Over (Partition by Student Order by Semester) As N
From Tbl
)
Select T.Student, Max(T2.Semester) As Semester, Max(T.Semester) As [Prev Semester], 
        string_agg(T.Class, ',') As Deleted_Classes
From CTE As T Left Join CTE As T1 On (T.Student=T1.Student 
                                      And T.N+1=T1.N
                                      And T.Class=T1.Class)
              Cross Apply (Select Distinct Semester 
                           From CTE 
                           Where Student=T.Student          
                                 And N=T.N+1) As T2
Where T1.N Is Null
Group by T.Student, T.N+1, T.N

Result:

Student Semester Prev Semester Deleted_Classes
Alice 12 11 Geometry

to get both deleted classes and added classes with a single query, you can use the following:

With CTE As (
Select Student, Semester, Class, 
       Dense_Rank() Over (Partition by Student Order by Semester) As N
From Tbl
)
Select T.Student, Max(T1.Semester) As Semester, Max(T.Semester) As [Prev Semester], 
        Max(T2.Deleted_Classes) As Deleted_Classes, Max(T3.Added_Classes) As Added_Classes
From (Select Distinct Student, Semester, N From CTE) As T Cross Apply
     (Select Distinct Student, Semester, N From CTE Where Student=T.Student And N=T.N+1) As T1
              Outer Apply (Select Student, Semester, string_agg(Class, ', ') As Deleted_Classes
                           From CTE 
                           Where Student=T.Student          
                                 And N=T.N
                                 And Class Not In (Select Class From CTE Where Student=T.Student And N=T.N+1)
                           Group by Student, Semester) As T2
              Outer Apply (Select Student, Semester, string_agg(Class, ', ') As Added_Classes
                           From CTE 
                           Where Student=T.Student          
                                 And N=T.N+1
                                 And Class Not In (Select Class From CTE Where Student=T.Student And N=T.N)
                           Group by Student, Semester) As T3
Group by T.Student, T.N+1, T.N
Having Max(T2.Deleted_Classes) Is Not Null Or Max(T3.Added_Classes) Is Not Null

db<>fiddle here

Result:

Student Semester Prev Semester Deleted_Classes Added_Classes
Alice 11 10 Portuguese English, Math
Alice 12 11 Geometry, Math Biology, Geography
Bob 12 11 Portuguese Math

Solution 2:[2]

with data as (
    select *,
        min(Semester) over (partition by Student, Class) as minSemester,
        max(Semester) over (partition by Student, Class) as maxSemester,
        count(*) over (partition by Student, Class) as cntSemester
    from T
    where Semester in (11, 12)
)
select Student, Class,
    case when minSemester = 12 then 'Added'   else '' end as Added,
    case when maxSemester = 11 then 'Dropped' else '' end as Dropped
from data
where maxSemester = 11;

There's all kinds of information you could derive from these values. For example, knowing that the latest semester was not 12 implies that class was dropped. You can do similar for additions.

Solution 3:[3]

Using a NOT EXISTS seems appropriate for this.

create table StudentSemesters (
 Student varchar(30),
 Class varchar(30),
 Semester int
);

insert into StudentSemesters
(Student, Class, Semester) values
  ('Alice',     'English',     11) 
, ('Alice',     'Geometry',    11)
, ('Alice',     'English',     12)
, ('Bob',       'Spanish',     11) 
, ('Bob',       'Spanish',     12) 
;
select Student
, Semester+1 as [Semester] 
, Semester as [Prev_Semester]
, STRING_AGG(Class, ', ') as [Deleted_Classes] 
from StudentSemesters t
where not exists (
    select 1
    from StudentSemesters t2
    where t2.Student = t.Student
      and t2.Class = t.Class
      and t2.Semester = t.Semester+1
    )
  and exists (
    select 1
    from StudentSemesters t2
    where t2.Student = t.Student
      and t2.Semester = t.Semester+1
    )
group by Student, Semester;
Student Semester Prev_Semester Deleted_Classes
Alice 12 11 Geometry
select 
  Student
, Semester as [Semester] 
, Semester-1 as [Prev_Semester]
, STRING_AGG(Class, ', ') as [Added_Classes] 
from StudentSemesters t
where not exists (
    select 1
    from StudentSemesters t2
    where t2.Student = t.Student
      and t2.Class = t.Class
      and t2.Semester = t.Semester-1
    )
group by Student, Semester;
Student Semester Prev_Semester Added_Classes
Alice 11 10 English, Geometry
Bob 11 10 Spanish

Combined, using full join

select 
  coalesce(t1.Student, t2.Student) as Student
, coalesce(t2.Semester, t1.Semester+1) as [Semester] 
, coalesce(t1.Semester, t2.Semester-1) as [Prev_Semester]
, STRING_AGG(t1.Class, ', ') as [Deleted_Classes] 
, STRING_AGG(t2.Class, ', ') as [Added_Classes]  
from StudentSemesters t1
full join StudentSemesters t2
  on t2.Student = t1.Student
 and t2.Class = t1.Class
 and t2.Semester = t1.Semester+1
where (t1.Class is null or
 (t2.Class is null
  and exists (
    select 1
    from StudentSemesters t3
    where t3.Student = t1.Student
      and t3.Semester = t1.Semester+1
    )))
group by 
  coalesce(t1.Student, t2.Student), 
  coalesce(t2.Semester, t1.Semester+1), 
  coalesce(t1.Semester, t2.Semester-1) 
order by Student, Semester;
Student Semester Prev_Semester Deleted_Classes Added_Classes
Alice 11 10 null English, Geometry
Alice 12 11 Geometry null
Bob 11 10 null Spanish

Demo on db<>fiddle here

Solution 4:[4]

Group rows ordered by Student and Semester according to Student, and in each group, group rows by Semester and perform inter-row calculations. The difference between the set of Classes in the spring semester and that in the fall semester is the newly added classes, and the difference between the fall semester and the spring semester contains the cancelled classes. It is a hassle to code this in SQL because you will need the window function, CROSSAPPLY and OUTERAPPLY. The SQL statement will be lengthy and difficult to understand. An alternative is to move data out of database and process it in Python or SPL. SPL, the open-source Java package, is easy to be integrated into a Java program and generate simple code. It can get this done with only two lines of code

A
1 =MSSQL.query@x("select * from Classes order by 1,3")
2 =A1.group@o(#1).conj(~.group@o(#1,#3;~.(#2)).new(Student,#2[+1]:Semester,#2:Prev_Semester,(#3#3[+1]).concat@c():Deleted_Classes,(#3[+1]#3).concat@c():Added_Classes).m(:-2))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3
Solution 4