'Take the Supersets of Arrays
i have a column in a Snowflake table that returns arrays.
Eg-
['A','U']
['A','P','U']
['A','P']
['P','U']
['M','S']
['S']
i need to remove the subsets and just take the supersets, so only
['A','P','U']
['M','S']
is there an easy way to do this?
Solution 1:[1]
not the best solution but if you can't find better try something like this:
SELECT *
FROM (
SELECT a, b, c,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY b) AS row_num
FROM (
SELECT a, b, c,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY c) AS row_num
FROM (
SELECT a, b, c,
ROW_NUMBER() OVER (PARTITION BY c ORDER BY b) AS row_num
FROM (
SELECT a, b, c,
ROW_NUMBER() OVER (PARTITION BY b ORDER BY b) AS row_num
from supertset
) where row_num = 1
)
)
)
QUALIFY row_num = 1
for reference:
https://docs.snowflake.com/en/sql-reference/constructs/qualify.html
Solution 2:[2]
For each array we are going to look what array contains it, by looking with a cross join for one with the largest intersection, and then choose the one that's the largest between those:
select distinct x
from (
select b.x
from data a
cross join data b
qualify 1 = row_number() over(
partition by a.id
order by array_size(array_intersection(a.x, b.x)) desc
, array_size(b.x) desc
)
)
This might not work with a different sample dataset, but the rules that the question includes are not enough to determine the right solution in other situations.
Data setup:
with data as (
select row_number() over(order by 1) id, parse_json(value) x
from table(split_to_table($$['A','U']
['A','P','U']
['A','P']
['P','U']
['M','S']
['S']$$, '\n'))
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jakub |
| Solution 2 | Felipe Hoffa |

