'Return elements of Redshift JSON array on separate rows
I have a Redshift table that looks like this:
id | metadata
---------------------------------------------------------------------------
1 | [{"pet":"dog"},{"country":"uk"}]
2 | [{"pet":"cat"}]
3 | []
4 | [{"country":"germany"},{"education":"masters"},{"country":"belgium"}]
- All array elements have just one field.
- There is no guarantee that a particular field will feature in any of an array's elements.
- A field name can be repeated in an array
- The array elements can be in any order
I am wanting to get back a table that looks like this:
id | field | value
------------------------
1 | pet | dog
1 | country | uk
2 | pet | cat
4 | country | germany
4 | education | masters
4 | country | belgium
I can then combine this with my queries on the rest of the input table.
I have tried playing around with the Redshift JSON functions, but without being able to write functions/use loops/have variables in Redshift, I really can't see a way to do this!
Please let me know if I can clarify anything else.
Solution 1:[1]
There is generic version for CREATE VIEW seq_0_to_3. Let's call it CREATE VIEW seq_0_to_n. This can be generated by
CREATE VIEW seq_0_to_n AS (
SELECT row_number() over (
ORDER BY TRUE)::integer - 1 AS i
FROM <insert_large_enough_table> LIMIT <number_less_than_table_entries>);
This helps in generating large sequences as a view.
Solution 2:[2]
It's now possible in Redshift to treat strings in either array format [] or json format {} as parsable json structures. First let's make a temp table based on your data:
create temporary table #t1 (id int, json_str varchar(100));
truncate table #t1;
insert into #t1 values (1, '[{"pet":"dog"},{"country":"uk"}]');
insert into #t1 values (2, '[{"pet":"cat"}]');
insert into #t1 values (3, '[]');
insert into #t1 values (4, '[{"country":"germany"},{"education":"masters"},{"country":"belgium"}]');
This creation of a common table expression (cte) will be used to implicitly convert the json_str field into a formal json structure of SUPER type. If the table's field were already SUPER type, we could skip this step.
drop table if exists #t2;
create temporary table #t2 as
with cte as
(select
x.id,
json_parse(x.json_str) as json_str -- convert string to SUPER structure
from
#t1 x
)
select
x.id
,unnested
from
cte x, x.json_str as unnested -- an alias of cte and x.json_str is required!
order by
id
;
Now we have an exploded list of key/value pairs to easily extract:
select
t2.id
,json_key -- this is the extracted key
,cast(json_val as varchar) as json_val -- eleminates the double quote marks
from
#t2 t2, unpivot t2.unnested as json_val at json_key --"at some_label" (e.g. json_key) will extract the key
order by
id
A different way to render the info is to allow the parsing engine to turn keys into columns. This isn't what you asked for, but potentially interesting:
select
id
,cast(t2.unnested.country as varchar) -- data is already parsed into rows, so it's directly addressable now
,cast(t2.unnested.education as varchar)
,cast(t2.unnested.pet as varchar)
from
#t2 t2
;
If you want more info on this, use a search engine to search for parsing the SUPER data type. If the data already existed as SUPER in the Redshift table, these latter 2 queries would work natively against the table, no need for a temp table.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | João Luiz Carabetta |
| Solution 2 |
