'Using Count + STRING_AGG, to display distinct values

I am trying to use COUNT with STRING_AGG. The problem is that in a subquery it counts as needed and I have count 1 when it's 1 employee that is taking the course, but when I run the full query I got the result doubled, like this:

COMPLETED: 1, COMPLETED: 1

When I put DISTINCT to count in STRING_AGG it gives me an error. Any idea how I can achieve this?

SELECT s1.course_id
      , s1.course_name
      , STRING_AGG(CONCAT(ts.training_status_name 
          + ': ' + 
          CAST(countt as varchar), ''),', ') AS num_of_attendies_and_status
FROM  (
         SELECT th.course_id
                , c.course_name
                , COUNT(DISTINCT e.employee_id) AS countt
         FROM course as c
                INNER JOIN training_history th ON c.course_id = th.course_id
                INNER JOIN employee e ON th.employee_id = e.employee_id
         GROUP BY th.course_id, course_name
      ) s1
      INNER JOIN training_history th ON s1.course_id = th.course_id
      INNER JOIN training_status ts ON th.training_status_id = ts.training_status_id
      INNER JOIN employee e ON th.employee_id = e.employee_id
GROUP BY s1.course_id
      , s1.course_name

Example of Output:

Course_id Course_name num_of_attendies_and_status
5 React COMPLETED: 1, COMPLETED: 1
7 C# COMPLETED: 1, COMPLETED: 1
9 Selenium REGISTERED: 1, REGISTERED: 1
13 Finance IN PROGRESS: 1, IN PROGRESS: 1

I have table employees with employee names and employee_id, courses with courses id, training_history which is a linkage between course_id and employee_id, and training status which shows course status name accordingly to a course id. I have 3 statuses here: COMPLETED, REGISTRED, IN PROGRESS.

The thing is, when I run the subquery:

  SELECT th.course_id
        , c.course_name
        , COUNT(DISTINCT e.employee_id) AS countt
  FROM   course as c
             INNER JOIN training_history th ON c.course_id = th.course_id
             INNER JOIN employee e ON th.employee_id = e.employee_id
  GROUP BY th.course_id
        , course_name

I got the following output:

Course_id Course_name num_of_attendies_and_status
5 React 1
7 C# 1
9 Selenium 1
13 Finance 1

But the final one for the first query, it's doubled. The thing is that there are 2 same employees that are duplicated. So I need somehow to distinct them. In the subquery COUNT(DISCTINCT employee_id) works fine. However in the main query, it's doubled. I am sorry for hard understanding questions. Hope something is clear.



Solution 1:[1]

I've used a CTE to collect the information with count(distinct) . NB I've added some lines to your sample data to have 2 statuses per course.

with training as
(SELECT 
  c.course_id,
  c.course_name,
  th.training_status_id as t_status_id,
  count(distinct employee_id) as count_emp
from dbo.course c
join dbo.training_history th on c.course_id = th.course_id
group by 
  c.course_id,
  c.course_name,
  th.training_status_id)
select 
  t.course_id,
  t.course_name,
  string_agg(concat(ts.training_status_name,':',count_emp),',') as num_of_attendies_and_status
from training t
join dbo.training_status ts
on t.t_status_id = ts.training_status_id
group by 
  t.course_id,
  t.course_name
;
course_id | course_name                                                  | num_of_attendies_and_status   
--------: | :----------------------------------------------------------- | :-----------------------------
        5 | React - The Complete Guide (incl Hooks, React Router, Redux) | COMPLETED: 1, COMPLETED: 1    
        7 | Learn Parallel Programming with C# and .NET                  | COMPLETED: 1, COMPLETED: 1    
        9 | Selenium WebDriver with Docker, Jenkins & AWS                | REGISTERED: 1, REGISTERED: 1  
       13 | Finance for Non Finance Executives                           | IN PROGRESS: 1, IN PROGRESS: 1

*db<>fiddle here5dab7f3db67ba78a)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1