'Is there a easy way to group/Clusters Entries in one Select
I have a table with several rows, that looks more or less like this.
+----+------------+-------------+
| ID | CustomerID | ProductType |
+----+------------+-------------+
| 1 | 1 | A |
| 2 | 1 | A |
| 3 | 1 | B |
| 4 | 1 | B |
| 5 | 1 | B |
| 6 | 2 | A |
| 7 | 2 | A |
+----+------------+-------------+
ID is just a auto_increment identity
CustomerID is a unique Id for each customer
ProductType Type of the specific Product
I need a select, that creates the following output:
Desired result:
+----+-------------+----------------+------------+-------------+
| ID | Cluster_ID | Sub_Cluster_ID | CustomerID | ProductType |
+----+-------------+----------------+------------+-------------+
| 1 | 1 | 1 | 1 | A |
| 2 | 1 | 2 | 1 | A |
| 3 | 2 | 1 | 1 | B |
| 4 | 2 | 2 | 1 | B |
| 5 | 2 | 3 | 1 | B |
| 6 | 1 | 1 | 2 | A |
| 7 | 1 | 2 | 2 | A |
+----+-------------+----------------+------------+-------------+
Cluster_ID is a counter for a customer, that only counts up on a new
ProductType(so all rows for a Customer, with the sameProductType, have the sameCluster_ID)
Sub_Cluster_ID is a simple counter, that counts up the entries for aProductTypeand Customer (so all rows for a Customer, with the sameCluster_ID, have a incrementingSub_Cluster_ID)
I tried using ROW_NUMBER, but I could not get it to work
Here some of my failed attempts:
This seems to works for the
Sub_Cluster_ID, but not for theCluster_ID.SELECT [ID] , ROW_NUMBER() OVER (Partition by CustomerID ORDER BY CustomerID, ProductType) Cluster_ID , ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID , [CustomerID] , [ProductType] FROM dbo.TableThis seems is almost there, but the
Cluster_IDis still not correctSELECT [ID] , ROW_NUMBER() OVER (Partition by CustomerID ORDER BY CustomerID) Cluster_ID , ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID , [CustomerID] , [ProductType] FROM dbo.Table
Almost there:
(but the clusterID is still not matching the needed Output)
+----+------------+----------------+------------+-------------+
| ID | Cluser_ID | Sub_Cluster_ID | CustomerID | ProductType |
+----+------------+----------------+------------+-------------+
| 1 | 1 | 1 | 1 | A |
| 2 | 2 | 2 | 1 | A |
| 3 | 3 | 1 | 1 | B |
| 4 | 4 | 2 | 1 | B |
| 5 | 5 | 3 | 1 | B |
| 6 | 1 | 1 | 2 | A |
| 7 | 2 | 2 | 2 | A |
+----+------------+----------------+------------+-------------+
Is it anyway possible? Or do I need subqueries?
Solution 1:[1]
Seems like you need dense_rank() instead of row_number() for Cluster_ID:
select ID,
dense_rank() over(partition by CustomerID order by CustomerID, ProductType) as Cluster_ID,
ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID,
CustomerID,
ProductType
from table_name;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
