'How to select non aggregated data when using aggregation functions [duplicate]
I am trying to include non-aggregated data, e.g. user_id after applying an aggregated function that does not include that same data.
I have found a solutions and looking for a more elegant solution.
I am looking for a solution without using a window function (like row_number).
Here is my data: It represent an event of a competition. Every unique combination of event_date and tier is a unique competition. My goal is to get the winner of each competition.
CREATE TABLE interview_2
(
user_id INT,
tier VARCHAR(25),
event_date DATE,
user_score INT
);
INSERT INTO interview_2 VALUES ( 1, 'A', GETDATE(), 100 );
INSERT INTO interview_2 VALUES ( 2, 'B', GETDATE(), 80);
INSERT INTO interview_2 VALUES ( 1, 'A', DATEADD( DAY, 1, GETDATE() ), 90);
INSERT INTO interview_2 VALUES ( 3, 'A', DATEADD( DAY, 1, GETDATE() ), 110);
INSERT INTO interview_2 VALUES ( 4, 'A', GETDATE(), 60);
INSERT INTO interview_2 VALUES ( 5, 'B', GETDATE(), 50);
GO
My solution was a simple join:
SELECT scores.tier,
scores.event_date,
users.user_id,
scores.max_score
FROM
interview_2 users
JOIN
(SELECT
tier,
event_date,
MAX(user_score) AS 'max_score'
FROM
interview_2 scores
GROUP BY
tier, event_date) scores ON users.tier = scores.tier
AND users.event_date = scores.event_date
AND users.user_score = scores.max_score
ORDER BY
1, 2, 4;
Output:
tier event_date user_id user_score
--------------------------------------
A 2022-03-18 1 100
A 2022-03-19 3 110
B 2022-03-18 2 80
Is there a simpler solution?
Solution 1:[1]
Update
Your existing query is good. Whether or not it's the simplest solution is kind of subjective. One alternative would be to put the subquery in the select list. It looks different, and might even perform different depending on your environment, but ultimately it the same logic, just organized differently.
SELECT DISTINCT
tier = i.tier,
event_date = i.event_date,
user_id = (
SELECT
w.user_id
FROM @interview_2 AS w
WHERE
w.user_score = MAX(i.user_score)
AND w.tier = i.tier
AND w.event_date = i.event_date
),
top_score = MAX(i.user_score)
FROM @interview_2 AS i
GROUP BY
i.tier,
i.event_date
ORDER BY
i.tier,
i.event_date,
MAX(i.user_score);
ORIGINAL POST
Window Functions:
SELECT DISTINCT
users.tier,
users.event_date,
FIRST_VALUE(users.user_id) OVER (PARTITION BY
users.tier,
users.event_date
ORDER BY users.user_score DESC
),
MAX(users.user_score) OVER (PARTITION BY users.tier, users.event_date)
FROM interview_2 AS users;
OVER PARTITION BY basically allows you to use aggregate functions on a column by column basis plus several other neat windowing functions. You can use different grouping and ordering for each. They are amazing once you understand them.
In this case:
FIRST_VALUE(user_id)will return the very first user_id it findsPARTITIONED (grouped) BYtier and event_date andORDER BY user_score DESCensures that the user_id with the highest score is the first record.MAX(user_score)will return the higest score it findsPARTITIONED (grouped) BYtier, event_date. No ordering is necessary here since it's an aggregate.DISTINCTis included because for each record with a corresponding tier/event_date theFIRST_VALUEandMAXwill be returned.
Here is a good starter article on Window Functions
Complete example
WITH
interview_2 AS (
SELECT
*
FROM (
VALUES (1, 'A', GETDATE(), 100),
(2, 'B', GETDATE(), 80),
(1, 'A', DATEADD(DAY, 1, GETDATE()), 90),
(3, 'A', DATEADD(DAY, 1, GETDATE()), 110),
(4, 'A', GETDATE(), 60),
(5, 'B', GETDATE(), 50)
) AS i2 (user_id, tier, event_date, user_score)
)
SELECT DISTINCT
users.tier,
users.event_date,
FIRST_VALUE(users.user_id) OVER (PARTITION BY
users.tier,
users.event_date
ORDER BY users.user_score DESC
),
MAX(users.user_score) OVER (PARTITION BY users.tier, users.event_date)
FROM interview_2 AS users;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
