'Is there a easy way to group/Clusters Entries in one Select

I have a table with several rows, that looks more or less like this.

+----+------------+-------------+
| ID | CustomerID | ProductType |
+----+------------+-------------+
| 1  | 1          | A           |
| 2  | 1          | A           |
| 3  | 1          | B           |
| 4  | 1          | B           |
| 5  | 1          | B           |
| 6  | 2          | A           |
| 7  | 2          | A           |
+----+------------+-------------+

ID is just a auto_increment identity
CustomerID is a unique Id for each customer
ProductType Type of the specific Product

I need a select, that creates the following output:

Desired result:

+----+-------------+----------------+------------+-------------+
| ID | Cluster_ID  | Sub_Cluster_ID | CustomerID | ProductType |
+----+-------------+----------------+------------+-------------+
| 1  | 1           | 1              | 1          | A           |
| 2  | 1           | 2              | 1          | A           |
| 3  | 2           | 1              | 1          | B           |
| 4  | 2           | 2              | 1          | B           |
| 5  | 2           | 3              | 1          | B           |
| 6  | 1           | 1              | 2          | A           |
| 7  | 1           | 2              | 2          | A           |
+----+-------------+----------------+------------+-------------+

Cluster_ID is a counter for a customer, that only counts up on a new ProductType (so all rows for a Customer, with the same ProductType, have the same Cluster_ID)
Sub_Cluster_ID is a simple counter, that counts up the entries for a ProductType and Customer (so all rows for a Customer, with the same Cluster_ID, have a incrementing Sub_Cluster_ID)

I tried using ROW_NUMBER, but I could not get it to work

Here some of my failed attempts:

  • This seems to works for the Sub_Cluster_ID, but not for the Cluster_ID.

    SELECT  [ID]
        , ROW_NUMBER() OVER (Partition by CustomerID ORDER BY CustomerID, ProductType) Cluster_ID
        , ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID
        , [CustomerID]
        , [ProductType]
    FROM dbo.Table
    
  • This seems is almost there, but the Cluster_ID is still not correct

    SELECT  [ID]
        , ROW_NUMBER() OVER (Partition by CustomerID ORDER BY CustomerID) Cluster_ID
        , ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID
        , [CustomerID]
        , [ProductType]
    FROM dbo.Table  
    

Almost there:
(but the clusterID is still not matching the needed Output)

 +----+------------+----------------+------------+-------------+
 | ID | Cluser_ID  | Sub_Cluster_ID | CustomerID | ProductType |
 +----+------------+----------------+------------+-------------+
 | 1  | 1          | 1              | 1          | A           |
 | 2  | 2          | 2              | 1          | A           |
 | 3  | 3          | 1              | 1          | B           |
 | 4  | 4          | 2              | 1          | B           |
 | 5  | 5          | 3              | 1          | B           |
 | 6  | 1          | 1              | 2          | A           |
 | 7  | 2          | 2              | 2          | A           |
 +----+------------+----------------+------------+-------------+

Is it anyway possible? Or do I need subqueries?



Solution 1:[1]

Seems like you need dense_rank() instead of row_number() for Cluster_ID:

select ID,
dense_rank() over(partition by CustomerID order by CustomerID, ProductType) as Cluster_ID,
ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID,
CustomerID,
ProductType
from table_name;

Fiddle

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1