'Amazon Athena parsing JSON

Struggling to parse some json. This is the format, where there are no fixed names/keys - everything is dynamic.

{ "{condition-operator}" : { "{condition-key}" : "{condition-value}" }}

An an example of values:

{
  "bool":{"aws:viaawsservice":"true"},
  "stringequals":{
        "ec2:createaction":[
            "CreateSecurityGroup",
            "CreateVolume",
            "CreateSnapshot",
            "RunInstances"
        ]
    }
}

I've managed to extract the 'operator' and 'key' values. (See below)

However, my result for 'values' is problematic.

One value is 'true', the other ["CreateSecurityGroup","CreateVolume","CreateSnapshot","RunInstances"]

Neither of which I seem able to use or cast as an UNNESTable array.

To be honest, getting woefully lost in what's going on !! I need to be able to unnest these, to get 1 row per value (so 5 values/rows in total)

Any guidance appreciated !

with cte as (
  select '{"bool":{"aws:viaawsservice":"true"},"stringequals":{"ec2:createaction":["CreateSecurityGroup","CreateVolume","CreateSnapshot","RunInstances"]}}'
    as sample
)
select 
  ,ct.ct as condition_operator
  ,map_keys(cast(ct.cb as map<varchar,json>))[1] as condition_key
  , map_values(cast(ct.cb as map<varchar,json>))[1] as condition_values
from 
  cte
  CROSS JOIN UNNEST(map_keys(cast(json_parse(cte.sample)as map<varchar,json>)),map_values(cast(json_parse(cte.sample)as map<varchar,json>))) ct(ct,cb)
  -- CROSS JOIN UNNEST( ## something here ##) as values(v)



condition_ope..   condition_key       condition_values 
(string(255))    (string(255))       (json) 
bool             aws:viaawsservice   "true" 
stringequals     ec2:createaction ["CreateSecurityGroup","CreateVolume","CreateSnapshot","RunInstances"]


Solution 1:[1]

You can use try, which results in null in case of failure, and attempt to cast data to array of varchar and fallback to either cast to varchar (which will fail in case of json object in value) or just using json_format:

select ct.ct as condition_operator,
    ct_key,
    ct_value
from cte
    CROSS JOIN UNNEST(
        map_keys(cast(json_parse(cte.sample) as map < varchar, json >)),
        map_values(cast(json_parse(cte.sample) as map < varchar, json >))
    ) ct(ct, cb)
    CROSS JOIN UNNEST(
        map_keys(cast(ct.cb as map < varchar, json >)),
        map_values(cast(ct.cb as map < varchar, json >))
    ) ct1(ct_key, ct_value_json)
    CROSS JOIN UNNEST(
        coalesce(try(cast(ct_value_json as array < varchar >)),array [ json_format(ct_value_json) ]
        )
    ) ct2(ct_value)

Output:

condition_operator ct_key ct_value
bool aws:viaawsservice true
stringequals ec2:createaction CreateSecurityGroup
stringequals ec2:createaction CreateVolume
stringequals ec2:createaction CreateSnapshot
stringequals ec2:createaction RunInstances

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1