'Distinct in an Inner Join

I am working on some SQL store procedures. In the SP I'm working at the moment I have to fill a table by taking some data from other tables. My problem is that in this Select in which there are Inner Joins, I need a Distinct to apply to only one Inner Join:

    SELECT pss.ProgramScoreSummaryID
        , vd.ProgramId, jou.TwAccountId, jou.DeviceId, psv.ScoreVersionID
        , COUNT(jou.JourneyId) AS JourneyCount
        , SUM(jou.Distance) AS TotalMileage
        , SUM(jou.Duration) AS TotalDuration
    FROM dbo.Program AS pro
        INNER JOIN [jrn].[ProgramScoreSummary] AS pss ON pro.ProgramId = pss.ProgramId
        INNER JOIN [dbo].[ProgramScoreVersion] AS psv ON pro.ProgramId = psv.ProgramID
            AND pss.ScoreVersionID = psv.ScoreVersionID         
        INNER JOIN [jrn].[Journey] AS jou ON jou.TwAccountId = vd.TwAccountId
            AND dev.DeviceId = jou.DeviceId             
    GROUP BY pss.ProgramScoreSummaryID

How do I insert a distinct so that the [Journey] table doesn't take duplicate data?



Solution 1:[1]

Use a subquery to transform the Journey table into what you want it to look like.

For each distinct row there is a bunch of rows corresponding to that row. The hard thing is to pick one row from each bunch to be the representative for that bunch. It's easy if you only need the distinct keys, or only need a simple aggregate like MIN or MAX. For more complicated ways to pick a representative you need to use ROW_NUMBER() OVER (PARTITION BY ...)

For example:

...
  INNER JOIN (
    SELECT TwAccountId, DeviceId,
      SUM(Distance) AS Distance,
      SUM(Duration) AS Duration
    FROM Journey
    GROUP BY TwAccountId, DeviceId
  ) AS jou
    ON jou.TwAccountId = vd.TwAccountId AND dev.DeviceId = jou.DeviceId

It's generally easier to do your grouping in a subquery and then join to other things than it is to join then group.

Solution 2:[2]

Thanks to the previous answer I was able to find a solution: What I did is use a With to make a select in which I use the functions ROW_NUMBER () OVER (PARTITION BY ...) here is how:

WITH TEST AS 
(
 select * from (select *, row_number() over (partition by Filename order by 
 JourneyId) as row_number from [jrn].[Journey] ) as rows 
 where row_number = 1
 )

Subsequently in the main select I recalled the select written above, so the Journey table became Test, which thanks to the rownumber function had no duplicates with the same filename:

SELECT pss.ProgramScoreSummaryID
    , vd.ProgramId, jou.TwAccountId, jou.DeviceId, psv.ScoreVersionID
    , COUNT(jou.JourneyId) AS JourneyCount
    , SUM(jou.Distance) AS TotalMileage
    , SUM(jou.Duration) AS TotalDuration
FROM dbo.Program AS pro
    INNER JOIN [jrn].[ProgramScoreSummary] AS pss ON pro.ProgramId = pss.ProgramId
    INNER JOIN [dbo].[ProgramScoreVersion] AS psv ON pro.ProgramId = psv.ProgramID
        AND pss.ScoreVersionID = psv.ScoreVersionID         
    INNER JOIN TEST AS jou ON jou.TwAccountId = vd.TwAccountId
        AND dev.DeviceId = jou.DeviceId             
GROUP BY pss.ProgramScoreSummaryID

I hope it will be of help to those who will find my same problem

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Richard Barraclough
Solution 2 Vinceenzo