'SQL : picking distinct values based on rank

I'm trying to find out the rank/row_number of IDs in a dataset and assign one ID to one cluster based on rank. The catch is, the same ID can be rank 1 for two different clusters. In this case, if one ID has already been assigned to one cluster, then the next rank should be assigned to the other cluster.

CLUSTER ID RNK
CLST1 ID1 1
CLST1 ID2 2
CLST2 ID1 1
CLST2 ID2 2

In this dataset, if ID1 is assigned to CLST1, then ID2 must be picked for CLST2 based on rank. How can I achieve this in Redshift?

sql


Solution 1:[1]

If you don't want duplicate rank numbers nor gaps use row_number(). The following script shows the difference between rank(), dense_rank() and row_number() when there is a duplicate value.

select
  id,
  rank() over (order by id) "rank",
  dense_rank() over (order by id) "dense_rank",
  row_number() over (order by id) "row_number"
from t;
id | rank | dense_rank | row_number
-: | ---: | ---------: | ---------:
 1 |    1 |          1 |          1
 2 |    2 |          2 |          2
 3 |    3 |          3 |          3
 3 |    3 |          3 |          4
 4 |    5 |          4 |          5
 5 |    6 |          5 |          6

MySQL db<>fiddle here
PostgreSQL db<>fiddle here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1