'How to create a new view that classifies each job based on the latest task code it has completed?

I have some data that I need to classify into various stages depending on the TaskCode that has an EndDate that is highest up in the hierarchy. Here the EndDate signifies the date that the TaskCode was completed. There are 8 TaskCodes that are relevant and signify that a Stage should be assigned to that job though there are dozens of TaskCodes that are irrelevant.

I have already written a program in Python that performs this task but must now complete it in SQL. The logic behind it in Python is as follows with "data" being the dataset described above and show as an example below:

df = pd.DataFrame(columns = ['JobNumber', 'Stage'])
for count, value in enumerate(data['JobNumber']):
  job = data[data['JobNumber'] == i]
  
  if job['TaskCode'].isin(052300):
    if job[job['CostCode'] == 052300]['ActualEndDate'].notna().all()
      df.loc[count, 'Stage'] = 'Final'
            continue

  if job['TaskCode'].isin(048000):
    if job[job['CostCode'] == 052300]['ActualEndDate'].notna().all()
      df.loc[count, 'Stage'] = 'Middle'
            continue

The code continues for each task that signifies a stage until at the end it provides a catch all "else:" statement that assigns all jobs that did not have any of the task codes.

There are two pieces I find particularly tricky: 1) there are cases where a job may have duplicates of a TaskCode that signal a stage but only one of them has an EndDate. In these cases we should not classify this job at the corresponding TaskCode stage as all duplicates must also have an EndDate. 2) The actual EndDate value is not important with respect to how we classify the data. It should move down a hierarchy of TaskCodes that then see if all values of that TaskCode for a specific job have non Null values. If that is the case then assign a stage.

I am not strong at SQL and have rarely used it outside of simple queries to retrieve data. Any help or suggestions would be greatly appreciated.

JobNumber	EndDate	TaskCode
12345	01/12/2022	052300
99999	NULL	048000
23780	11/24/2021	789010
12345	NULL	052300

Expected result table:

JobNumber	Stage
12345	Final
99999	Start
23780	Middle

Edit: My current query of -

SELECT JT.JobNumber,
CASE WHEN TaskCode = '456787' AND EndDate IS NOT NULL THEN 'Final'
     WHEN CostCode = '345881' AND EndDate IS NOT NULL THEN 'Interior'
     WHEN TaskCode = '986482' AND EndDate IS NOT NULL THEN 'Trim'
     WHEN TaskCode = '456480' AND EndDate IS NOT NULL THEN 'Sheetrock'
     WHEN TaskCode = '548638' AND EndDate IS NOT NULL THEN 'Mechanicals'
     WHEN TaskCode = '001001' AND EndDate IS NOT NULL THEN 'Cornice'
     WHEN TaskCode = '010840' AND EndDate IS NOT NULL THEN 'Frame'
     WHEN TaskCode = '049100' AND EndDate IS NOT NULL THEN 'Frame'
     ELSE 'Slab' END AS Stage
FROM [Task] AS JT
INNER JOIN [Jobs] AS CJ ON JT.JobNumber = CJ.JobNumber

is close to what I am looking for. However, the result of this returns the job with a stage classification for every task that job has (many duplicates of the same JobNumber). My desired result is a unique JobNumber column with one classification for the stage in which the TaskCode of the highest priority is satisfied (the highest up in the CASE statement) for each job.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to create a new view that classifies each job based on the latest task code it has completed?

Sources

Related Questions