'How do I aggregate values from key-value pairs in Snowflake?

I have a table with a column containing a comma-delimited list of "event" IDs. Some of the events have a quantity associated, delimited by an equals sign. Here is an example:

237=33.00,238=98.00,239,100,101, ...
  • So I have the following events: 237, 238, 239, 100, 101

  • And these events have a quantity associated: 237 (33.00), 238 (98.00)

I'm trying to count occurrences of events and for events with a quantity, either average or sum. I'm also trying to include other dimensional columns from the table, like day.

This seems to be working, but I'm wondering with all the Snowflake functions for semi-structured data if there is a better approach. This also doesn't totally solve my problem as I need each aggregated value in its own column.

with cte as (
  select
    day,
    EVENTS
  from mytable
  where not EVENTS is null
)
select
    day,
    avg(split_part(value, '=', 2)) as avgOfEvent237
from cte, lateral split_to_table(cte.EVENTS, ',')
where value like '237=%'
group by 1;
DAY AVGOFEVENT237
2022-03-01 35.9


Solution 1:[1]

So given split_to_tables will discard rows where there is no splitting, you can drop the CTE that just prunes the NULLs

Then you can embedded a IFF and STARTSWITH into the AVG to get the values per value.

SELECT m.day
    ,AVG(iff(startswith(v.value, '237='), split_part(value, '=', 2)::float, null)) as avgOfEvent237
    ,AVG(iff(startswith(v.value, '238='), split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
    ,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1;

So using this CTE with "fake data"

WITH mytable(day, events) as (
    select * FROM VALUES
    ('2022-03-23'::date, '237=33.00,238=98.00,239,100,101'),
    ('2022-03-23'::date, '237=35.00,238=96.00,239,100,101')
)
SELECT m.day
    ,AVG(iff(startswith(v.value, '237='), split_part(value, '=', 2)::float, null)) as avgOfEvent237
    ,AVG(iff(startswith(v.value, '238='), split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
    ,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1;

I get the results:

DAY AVGOFEVENT237 AVGOFEVENT238
2022-03-23 34 97

You LIKE version is smaller:

SELECT m.day
    ,AVG(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as avgOfEvent237
    ,AVG(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
    ,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1

And you can spin it out to do your own SUM/COUNT and AVG:

SELECT m.day
    ,AVG(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as avgOfEvent237
    ,AVG(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as avgOfEvent238
    ,SUM(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as sumOfEvent237
    ,SUM(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as sumOfEvent238
    ,count(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as countOfEvent237
    ,count(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as countOfEvent238
    ,DIV0(sumOfEvent237, countOfEvent237) as byHandAvgOfEvent237
    ,DIV0(sumOfEvent238, countOfEvent238) as byHandAvgOfEvent238
FROM mytable m
    ,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1