'Take the Supersets of Arrays

i have a column in a Snowflake table that returns arrays.

Eg-

['A','U']
['A','P','U']
['A','P']
['P','U']
['M','S']
['S']

i need to remove the subsets and just take the supersets, so only

['A','P','U']
['M','S']

is there an easy way to do this?



Solution 1:[1]

not the best solution but if you can't find better try something like this:

SELECT * 
FROM (
     SELECT a, b, c, 
            ROW_NUMBER() OVER (PARTITION BY a ORDER BY b) AS row_num
        FROM (
           SELECT a, b, c, 
            ROW_NUMBER() OVER (PARTITION BY a ORDER BY c) AS row_num
        FROM (
            SELECT a, b, c, 
            ROW_NUMBER() OVER (PARTITION BY c ORDER BY b) AS row_num
            FROM (
           SELECT a, b, c, 
            ROW_NUMBER() OVER (PARTITION BY b ORDER BY b) AS row_num
            from supertset
            ) where row_num = 1
        )
        )
    )
QUALIFY row_num = 1

for reference:

https://docs.snowflake.com/en/sql-reference/constructs/qualify.html

Solution 2:[2]

For each array we are going to look what array contains it, by looking with a cross join for one with the largest intersection, and then choose the one that's the largest between those:

select distinct x
from (
    select b.x 
    from data a
    cross join data b
    qualify 1 = row_number() over(
        partition by a.id 
        order by array_size(array_intersection(a.x, b.x)) desc
        , array_size(b.x) desc
    )
)

enter image description here

This might not work with a different sample dataset, but the rules that the question includes are not enough to determine the right solution in other situations.

Data setup:

with data as (
select row_number() over(order by 1) id, parse_json(value) x
from table(split_to_table($$['A','U']
['A','P','U']
['A','P']
['P','U']
['M','S']
['S']$$, '\n')) 
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jakub
Solution 2 Felipe Hoffa