'SQL : picking distinct values based on rank
I'm trying to find out the rank/row_number of IDs in a dataset and assign one ID to one cluster based on rank. The catch is, the same ID can be rank 1 for two different clusters. In this case, if one ID has already been assigned to one cluster, then the next rank should be assigned to the other cluster.
| CLUSTER | ID | RNK |
|---|---|---|
| CLST1 | ID1 | 1 |
| CLST1 | ID2 | 2 |
| CLST2 | ID1 | 1 |
| CLST2 | ID2 | 2 |
In this dataset, if ID1 is assigned to CLST1, then ID2 must be picked for CLST2 based on rank. How can I achieve this in Redshift?
Solution 1:[1]
If you don't want duplicate rank numbers nor gaps use row_number().
The following script shows the difference between rank(), dense_rank() and row_number() when there is a duplicate value.
select id, rank() over (order by id) "rank", dense_rank() over (order by id) "dense_rank", row_number() over (order by id) "row_number" from t;id | rank | dense_rank | row_number -: | ---: | ---------: | ---------: 1 | 1 | 1 | 1 2 | 2 | 2 | 2 3 | 3 | 3 | 3 3 | 3 | 3 | 4 4 | 5 | 4 | 5 5 | 6 | 5 | 6
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
