'Produce DISTINCT values in STRING_AGG

I'm using the STRING_AGG function in SQL Server 2017. I'd like to create the same effect as COUNT(DISTINCT <column>). I tried STRING_AGG(DISTINCT <column>,',') but that is not legal syntax.

I'd like to know if there is a T-SQL work-around. Here is my sample:

WITH Sitings 
  AS
  (
    SELECT * FROM (VALUES 
      (1, 'Florida', 'Orlando', 'bird'),
      (2, 'Florida', 'Orlando', 'dog'),
      (3, 'Arizona', 'Phoenix', 'bird'),
      (4, 'Arizona', 'Phoenix', 'dog'),
      (5, 'Arizona', 'Phoenix', 'bird'),
      (6, 'Arizona', 'Phoenix', 'bird'),
      (7, 'Arizona', 'Phoenix', 'bird'),
      (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
  ) 
SELECT State, City, COUNT(DISTINCT Siting) [# Of Types], STRING_AGG(Siting,',') Animals
FROM Sitings 
GROUP BY State, City

The above produces the following result:

+---------+-----------+--------------+-------------------------+
|  State  |   City    | # Of Types   |         Animals         |
+---------+-----------+--------------+-------------------------+
| Arizona | Flagstaff |            1 | dog                     |
| Florida | Orlando   |            2 | dog,bird                |
| Arizona | Phoenix   |            2 | bird,bird,bird,dog,bird |
+---------+-----------+--------------+-------------------------+

The output is exactly what I want, except I want the concatenated "Animals" listed for Phoenix Arizona to be DISTINCT, like this:

+---------+-----------+--------------+--------------------+
|  State  |   City    | # Of Types   |      Animals       |
+---------+-----------+--------------+--------------------+
| Arizona | Flagstaff |            1 | dog                |
| Florida | Orlando   |            2 | dog,bird           |
| Arizona | Phoenix   |            2 | bird,dog           |
+---------+-----------+--------------+--------------------+

Any ideas?

When I use my real data set, which is much larger, I get an error about the "Animals" column exceeding 8000 characters.

My question I think is the same as this one, except my example is much simpler.



Solution 1:[1]

just use sub-query

WITH Sitings 
      AS
      (
        SELECT * FROM (VALUES 
          (1, 'Florida', 'Orlando', 'bird'),
          (2, 'Florida', 'Orlando', 'dog'),
          (3, 'Arizona', 'Phoenix', 'bird'),
          (4, 'Arizona', 'Phoenix', 'dog'),
          (5, 'Arizona', 'Phoenix', 'bird'),
          (6, 'Arizona', 'Phoenix', 'bird'),
          (7, 'Arizona', 'Phoenix', 'bird'),
          (8, 'Arizona', 'Flagstaff', 'dog')
        ) F (ID, State, City, Siting)
      ) 

    select State,City,count(*) as [# Of Types],STRING_AGG(Siting,',') AS Animals from 
    (
      SELECT State, City, Siting
    FROM Sitings 
    GROUP BY State, City,Siting
    ) as T  group by State,City

http://sqlfiddle.com/#!18/ba4b8/11

  State     City    # Of Types  Animals
Arizona     Flagstaff   1   dog
Florida     Orlando     2   bird,dog
Arizona     Phoenix     2   bird,dog

Solution 2:[2]

Here is one more way of doing it (sql fiddle):

  WITH Sitings 
  AS
  (
    SELECT * FROM (VALUES 
      (1, 'Florida', 'Orlando', 'bird'),
      (2, 'Florida', 'Orlando', 'dog'),
      (3, 'Arizona', 'Phoenix', 'bird'),
      (4, 'Arizona', 'Phoenix', 'dog'),
      (5, 'Arizona', 'Phoenix', 'bird'),
      (6, 'Arizona', 'Phoenix', 'bird'),
      (7, 'Arizona', 'Phoenix', 'bird'),
      (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
  ) 

select State,City,count(*) as [# Of Sitings],(select string_agg(value,', ') from (select distinct value from string_split(string_agg(Siting, ','),',')) t) AS Animals
FROM Sitings 
GROUP BY State, City

You may easily convert the splitting and merging part into a reusable scalar valued function.

NOTE

This is NOT an optimal solution, if you group first and then do aggregate (like answers above) it is better. Also, it does not get # of Types, it gets # of Sitings instead. However, it becomes handy as a quick inline function.

Solution 3:[3]

Of course a very late reply.

This is another way. The Siting in STRING_AGG(Siting,',') can be subqueried to returned a DISTINCT list of SITTING where the grouped keys match STATE and CITIES.

WITH Sitings 
      AS
      (
        SELECT * FROM (VALUES 
          (1, 'Florida', 'Orlando', 'bird'),
          (2, 'Florida', 'Orlando', 'dog'),
          (3, 'Arizona', 'Phoenix', 'bird'),
          (4, 'Arizona', 'Phoenix', 'dog'),
          (5, 'Arizona', 'Phoenix', 'bird'),
          (6, 'Arizona', 'Phoenix', 'bird'),
          (7, 'Arizona', 'Phoenix', 'bird'),
          (8, 'Arizona', 'Flagstaff', 'dog')
        ) F (ID, State, City, Siting)
      ) 

SELECT 
    S.State, 
    S.City, 
    COUNT(DISTINCT S.Siting) AS [# Of Types],

    --STRING_AGG(S.Siting,',') AS Animals
    (
        SELECT STRING_AGG(U.SITING, ',')
        FROM 
        (
            SELECT DISTINCT T.Siting
            FROM Sitings AS T
            WHERE 
                T.State = S.State AND
                T.City = S.City
        ) AS U
    ) AS ANIMAL

FROM 
    Sitings AS S
GROUP BY 
    S.State, 
    S.City
ORDER BY
    S.State, 
    S.City

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Zaynul Abadin Tuhin
Solution 2
Solution 3 Kwok Hoi Man