'How to create a new view that classifies each job based on the latest task code it has completed?

I have some data that I need to classify into various stages depending on the TaskCode that has an EndDate that is highest up in the hierarchy. Here the EndDate signifies the date that the TaskCode was completed. There are 8 TaskCodes that are relevant and signify that a Stage should be assigned to that job though there are dozens of TaskCodes that are irrelevant.

I have already written a program in Python that performs this task but must now complete it in SQL. The logic behind it in Python is as follows with "data" being the dataset described above and show as an example below:

df = pd.DataFrame(columns = ['JobNumber', 'Stage'])
for count, value in enumerate(data['JobNumber']):
  job = data[data['JobNumber'] == i]
  
  if job['TaskCode'].isin(052300):
    if job[job['CostCode'] == 052300]['ActualEndDate'].notna().all()
      df.loc[count, 'Stage'] = 'Final'
            continue

  if job['TaskCode'].isin(048000):
    if job[job['CostCode'] == 052300]['ActualEndDate'].notna().all()
      df.loc[count, 'Stage'] = 'Middle'
            continue

The code continues for each task that signifies a stage until at the end it provides a catch all "else:" statement that assigns all jobs that did not have any of the task codes.

There are two pieces I find particularly tricky: 1) there are cases where a job may have duplicates of a TaskCode that signal a stage but only one of them has an EndDate. In these cases we should not classify this job at the corresponding TaskCode stage as all duplicates must also have an EndDate. 2) The actual EndDate value is not important with respect to how we classify the data. It should move down a hierarchy of TaskCodes that then see if all values of that TaskCode for a specific job have non Null values. If that is the case then assign a stage.

I am not strong at SQL and have rarely used it outside of simple queries to retrieve data. Any help or suggestions would be greatly appreciated.

JobNumber EndDate TaskCode
12345 01/12/2022 052300
99999 NULL 048000
23780 11/24/2021 789010
12345 NULL 052300

Expected result table:

JobNumber Stage
12345 Final
99999 Start
23780 Middle

Edit: My current query of -

SELECT JT.JobNumber,
CASE WHEN TaskCode = '456787' AND EndDate IS NOT NULL THEN 'Final'
     WHEN CostCode = '345881' AND EndDate IS NOT NULL THEN 'Interior'
     WHEN TaskCode = '986482' AND EndDate IS NOT NULL THEN 'Trim'
     WHEN TaskCode = '456480' AND EndDate IS NOT NULL THEN 'Sheetrock'
     WHEN TaskCode = '548638' AND EndDate IS NOT NULL THEN 'Mechanicals'
     WHEN TaskCode = '001001' AND EndDate IS NOT NULL THEN 'Cornice'
     WHEN TaskCode = '010840' AND EndDate IS NOT NULL THEN 'Frame'
     WHEN TaskCode = '049100' AND EndDate IS NOT NULL THEN 'Frame'
     ELSE 'Slab' END AS Stage
FROM [Task] AS JT
INNER JOIN [Jobs] AS CJ ON JT.JobNumber = CJ.JobNumber

is close to what I am looking for. However, the result of this returns the job with a stage classification for every task that job has (many duplicates of the same JobNumber). My desired result is a unique JobNumber column with one classification for the stage in which the TaskCode of the highest priority is satisfied (the highest up in the CASE statement) for each job.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source