'Distinct in an Inner Join
I am working on some SQL store procedures. In the SP I'm working at the moment I have to fill a table by taking some data from other tables. My problem is that in this Select in which there are Inner Joins, I need a Distinct to apply to only one Inner Join:
SELECT pss.ProgramScoreSummaryID
, vd.ProgramId, jou.TwAccountId, jou.DeviceId, psv.ScoreVersionID
, COUNT(jou.JourneyId) AS JourneyCount
, SUM(jou.Distance) AS TotalMileage
, SUM(jou.Duration) AS TotalDuration
FROM dbo.Program AS pro
INNER JOIN [jrn].[ProgramScoreSummary] AS pss ON pro.ProgramId = pss.ProgramId
INNER JOIN [dbo].[ProgramScoreVersion] AS psv ON pro.ProgramId = psv.ProgramID
AND pss.ScoreVersionID = psv.ScoreVersionID
INNER JOIN [jrn].[Journey] AS jou ON jou.TwAccountId = vd.TwAccountId
AND dev.DeviceId = jou.DeviceId
GROUP BY pss.ProgramScoreSummaryID
How do I insert a distinct so that the [Journey] table doesn't take duplicate data?
Solution 1:[1]
Use a subquery to transform the Journey table into what you want it to look like.
For each distinct row there is a bunch of rows corresponding to that row. The hard thing is to pick one row from each bunch to be the representative for that bunch. It's easy if you only need the distinct keys, or only need a simple aggregate like MIN or MAX. For more complicated ways to pick a representative you need to use ROW_NUMBER() OVER (PARTITION BY ...)
For example:
...
INNER JOIN (
SELECT TwAccountId, DeviceId,
SUM(Distance) AS Distance,
SUM(Duration) AS Duration
FROM Journey
GROUP BY TwAccountId, DeviceId
) AS jou
ON jou.TwAccountId = vd.TwAccountId AND dev.DeviceId = jou.DeviceId
It's generally easier to do your grouping in a subquery and then join to other things than it is to join then group.
Solution 2:[2]
Thanks to the previous answer I was able to find a solution:
What I did is use a With to make a select in which I use the functions
ROW_NUMBER () OVER (PARTITION BY ...)
here is how:
WITH TEST AS
(
select * from (select *, row_number() over (partition by Filename order by
JourneyId) as row_number from [jrn].[Journey] ) as rows
where row_number = 1
)
Subsequently in the main select I recalled the select written above, so the Journey table became Test, which thanks to the rownumber function had no duplicates with the same filename:
SELECT pss.ProgramScoreSummaryID
, vd.ProgramId, jou.TwAccountId, jou.DeviceId, psv.ScoreVersionID
, COUNT(jou.JourneyId) AS JourneyCount
, SUM(jou.Distance) AS TotalMileage
, SUM(jou.Duration) AS TotalDuration
FROM dbo.Program AS pro
INNER JOIN [jrn].[ProgramScoreSummary] AS pss ON pro.ProgramId = pss.ProgramId
INNER JOIN [dbo].[ProgramScoreVersion] AS psv ON pro.ProgramId = psv.ProgramID
AND pss.ScoreVersionID = psv.ScoreVersionID
INNER JOIN TEST AS jou ON jou.TwAccountId = vd.TwAccountId
AND dev.DeviceId = jou.DeviceId
GROUP BY pss.ProgramScoreSummaryID
I hope it will be of help to those who will find my same problem
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Richard Barraclough |
| Solution 2 | Vinceenzo |
