'SQL Server query select min over partition by
I have some hard time to understand this over select partition by statement. I read about it but don't get it. What's the benefit of using:
SELECT MIN(Field) OVER (PARTITION BY OtherField) as Value
FROM MYTABLE
Can you briefly describe this, when will you use it and why? What's the benefit and the acheivement goal when you write something like this
Solution 1:[1]
OVER (PARTITION BY OtherField) is a window function. In addition, the main idea it to group by partitions without reducing of count of selected table rows.
In general, window functions are going to be faster than join/aggregation solutions. This is a rather simple case, so the performance might be essentially the same.
Let me show an example:
CREATE TABLE dbo.Duration (startDate datetime2, endDate datetime2);
INSERT INTO dbo.Duration (startDate, endDate) VALUES ('2007-05-06 16:42:09', '2007-05-07 11:10:08');
INSERT INTO dbo.Duration (startDate, endDate) VALUES ('2007-05-07 08:30:09', '2007-05-07 12:12:43');
INSERT INTO dbo.Duration (startDate, endDate) VALUES ('2007-05-07 11:35:09', '2007-05-07 17:13:39');
INSERT INTO dbo.Duration (startDate, endDate) VALUES ('2007-05-07 08:30:10', '2007-05-07 12:12:43');
INSERT INTO dbo.Duration (startDate, endDate) VALUES ('2007-05-07 08:30:11', '2007-05-07 12:12:43');
INSERT INTO dbo.Duration (startDate, endDate) VALUES ('2007-05-07 08:30:12', '2007-05-07 12:12:43');
Now we can make partitions by field EndDate and get the MIN() of startDatein the partition ofEndDate`:
SELECT
MIN(dr.startDate) OVER (PARTITION BY dr.endDate ) MinDateByPartitionOfEndDate
, dr.endDate
FROM dbo.Duration dr
Pay great attention to 2007-05-07 12:12:43 and their values 2007-05-07 08:30:09.0000000. So we did not reduce the query result, however we got MIN value of each partition by desired field.
OUTPUT:
MinDateByPartitionOfEndDate endDate
2007-05-06 16:42:09.0000000 2007-05-07 11:10:08.0000000
2007-05-07 08:30:09.0000000 2007-05-07 12:12:43.0000000
2007-05-07 08:30:09.0000000 2007-05-07 12:12:43.0000000
2007-05-07 08:30:09.0000000 2007-05-07 12:12:43.0000000
2007-05-07 08:30:09.0000000 2007-05-07 12:12:43.0000000
2007-05-07 11:35:09.0000000 2007-05-07 17:13:39.0000000
2007-05-07 11:35:09.0000000 2007-05-07 17:13:39.0000000
And a GROUP BY query to see a main difference between GROUP BY and PARTITION BY (it reduces the query result):
SELECT
MIN(dr.startDate) MinDateByGroupBy
, dr.endDate
FROM dbo.Duration dr
GROUP BY dr.endDate
OUTPUT:
MinDateByGroupBy endDate
2007-05-06 16:42:09.0000000 2007-05-07 11:10:08.0000000
2007-05-07 08:30:09.0000000 2007-05-07 12:12:43.0000000
2007-05-07 11:35:09.0000000 2007-05-07 17:13:39.0000000
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
