'How to extract all values from variant/object in Snowflake?

One variant column contains the following data:

[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  }
]

where numbers are values. If I would want to extract all keys as one array, there is a function OBJECT_KEYS for this. But how can I extract all values to get this output?

[
  "1",
  "2",
  "3"
]

Additional note. Keys are always the same as well as values mapped to them. More detailed example. For input with 3 records/rows:

[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  }
]
[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  }
]
[
  {
    "a": "1",
    "c": "3"
  }
]

The output should be:

{"1", "2", "3"}
{"1", "2", "3"}
{"1", "3"}


Solution 1:[1]

A single Object per array:

Assuming the data has only a single object per array:

With data as (
    select parse_json(column1) as json
    from values
    ('[{"a": "1","b": "2","c": "3"}]'),
    ('[{"a": "1","b": "2","c": "3"}]'),
    ('[{"a": "1","c": "3"}]')
)
select 
    '{'|| listagg(distinct '"'||v.value||'"', ',') within group (order by '"'||v.value||'"')|| '}' as output
from data, table(flatten(json[0]))v
group by v.seq
order by v.seq

gives:

OUTPUT
{"1","2","3"}
{"1","2","3"}
{"1","3"}

Multiple Objects per array, merged:

With data as (
    select parse_json(column1) as json
    from values
    ('[{"a": "1","b": "2","c": "3"},{"a": "1","d": "4","e": "5"}]'),
    ('[{"a": "1","b": "2","c": "3"}]'),
    ('[{"a": "1","c": "3"}]')
)
select 
    '{'|| listagg(distinct '"'||v.value||'"', ',') within group (order by '"'||v.value||'"')|| '}' as output
from data
    ,table(flatten(json))a
    ,table(flatten(a.value))v
group by a.seq
order by a.seq

gives:

OUTPUT
{"1","2","3","4","5"}
{"1","2","3"}
{"1","3"}

Solution 2:[2]

There are a few ways to do this depending on how fixed the description of the data is that you have provided?

If there is only ever one element in your outer array per your example, or you only ever want the 1st element if there are multiple, you can first pluck the element from the array with mycol[0] and then lateral flatten the array, and array_agg the VALUEs from the lateral flatten

-- CTE to create  data
with data as (Select parse_json('[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  },
  {
    "d": "4",
    "e": "5",
    "f": "6"
  }
]'  ) myCol)
-- Query
Select array_agg(first_inner_array.value) result
from data, 
     lateral flatten(input => mycol[0]) first_inner_array
--Group By first_inner_array.index
;

If you have multiple elements in the array that you need to extract you can use two lateral flattens.

-- CTE to create  data
with data as (Select parse_json('[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  },
  {
    "d": "4",
    "e": "5",
    "f": "6"
  }
]'  ) myCol)
-- Query
Select flat_outer_array.index, array_agg(flat_inner_array.value) 
from data, 
     lateral flatten(input => mycol) flat_outer_array,  
     lateral flatten(input => flat_outer_array.value) flat_inner_array
-- Uncomment the where clause below to use this solution to pick the 1st element only
-- Where flat_outer_array.INDEX = 0     
Group By flat_outer_array.index     ;

You could also create a simple Javascript function to extract VALUES from the Object, as per the inverse of OBJECT_KEYS returning the KEYS from an object which you were looking for. This avoids the flatten and array_agg operations.

create Or Replace Function OBJECT_VALUES(input_object object)
-- Pluck the Values from an Object and create an ARRAY.
-- Maintains the order of the OBJECT (KEYS) in the ARRAY.
  returns ARRAY
  language JAVASCRIPT
as
$$
// Return the values from the Object as an ARRAY
  return Object.values(INPUT_OBJECT);
$$;

-- CTE to create  data
with data as (Select parse_json('[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  },
  {
    "d": "4",
    "e": "5",
    "f": "6"
  }
]'  ) myCol)
Select OBJECT_VALUES(mycol[0]) from data

Or if you need multiple elements from the ARRAY.

-- CTE to create  data
with data as (Select parse_json('[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  },
  {
    "d": "4",
    "e": "5",
    "f": "6"
  }
]'  ) myCol)
-- Query
Select flat_outer_array.index, OBJECT_VALUES(value) 
from data, 
     lateral flatten(input => mycol) flat_outer_array
Group By 1,2;

The performance of these may vary based on data scale and shape of the data so you may want to try out the different options.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Simeon Pilgrim
Solution 2 Fieldy