'How do I aggregate values from key-value pairs in Snowflake?
I have a table with a column containing a comma-delimited list of "event" IDs. Some of the events have a quantity associated, delimited by an equals sign. Here is an example:
237=33.00,238=98.00,239,100,101, ...
So I have the following events: 237, 238, 239, 100, 101
And these events have a quantity associated: 237 (33.00), 238 (98.00)
I'm trying to count occurrences of events and for events with a quantity, either average or sum. I'm also trying to include other dimensional columns from the table, like day.
This seems to be working, but I'm wondering with all the Snowflake functions for semi-structured data if there is a better approach. This also doesn't totally solve my problem as I need each aggregated value in its own column.
with cte as (
select
day,
EVENTS
from mytable
where not EVENTS is null
)
select
day,
avg(split_part(value, '=', 2)) as avgOfEvent237
from cte, lateral split_to_table(cte.EVENTS, ',')
where value like '237=%'
group by 1;
| DAY | AVGOFEVENT237 |
|---|---|
| 2022-03-01 | 35.9 |
Solution 1:[1]
So given split_to_tables will discard rows where there is no splitting, you can drop the CTE that just prunes the NULLs
Then you can embedded a IFF and STARTSWITH into the AVG to get the values per value.
SELECT m.day
,AVG(iff(startswith(v.value, '237='), split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(startswith(v.value, '238='), split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1;
So using this CTE with "fake data"
WITH mytable(day, events) as (
select * FROM VALUES
('2022-03-23'::date, '237=33.00,238=98.00,239,100,101'),
('2022-03-23'::date, '237=35.00,238=96.00,239,100,101')
)
SELECT m.day
,AVG(iff(startswith(v.value, '237='), split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(startswith(v.value, '238='), split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1;
I get the results:
| DAY | AVGOFEVENT237 | AVGOFEVENT238 |
|---|---|---|
| 2022-03-23 | 34 | 97 |
You LIKE version is smaller:
SELECT m.day
,AVG(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1
And you can spin it out to do your own SUM/COUNT and AVG:
SELECT m.day
,AVG(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as avgOfEvent238
,SUM(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as sumOfEvent237
,SUM(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as sumOfEvent238
,count(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as countOfEvent237
,count(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as countOfEvent238
,DIV0(sumOfEvent237, countOfEvent237) as byHandAvgOfEvent237
,DIV0(sumOfEvent238, countOfEvent238) as byHandAvgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
