'Remove duplicates based on multiple columns and set conditions to choose one unique record
I need SQL help in order to get rid of a duplicate in one column, based on the conditions set from other columns.
The Audit column can have NULL values. I want to prioritize the record which:
- Gets rid of the duplicate Color
- Then sorts by the earliest Date
- However, if the Audit column is NULL for the earlier Date, I want to select the next record where it isn't.
- If the Audit column is NULL for all entries, I still want the entry with the earliest Date to be selected.
Example below.
Here's what I have:
| ID | Color | Date | Audit |
|---|---|---|---|
| 1 | red | 4-1-21 | Y |
| 2 | red | 4-2-21 | N |
| 3 | blue | 4-3-21 | |
| 4 | blue | 4-4-21 | Y |
| 5 | pink | 4-5-21 | |
| 6 | pink | 4-6-21 |
Here's what I want:
| ID | Color | Date | Audit |
|---|---|---|---|
| 1 | red | 4-1-21 | Y |
| 4 | blue | 4-4-21 | Y |
| 5 | pink | 4-5-21 |
Can anyone help me figure out the SQL logic to properly execute this?
Solution 1:[1]
This answer assumes that your want to choose the yes audit record, if available, followed by no, then followed by NULL, if neither of the previous two be available.
SELECT DISTINCT ON (Color) *
FROM yourTable
ORDER BY Color, Audit DESC NULLS LAST, Date;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tim Biegeleisen |
