'In Athena how do I query a member of a struct in an array in a struct?
I am trying to figure out how to query where I am checking the value of usage given the following table creation:
CREATE EXTERNAL TABLE IF NOT EXISTS foo.test (
`id` string,
`foo` struct< usages:array< struct< usage:string,
method_id:int,
start_at:string,
end_at:string,
location:array<string> >>>
) PARTITIONED BY (
timestamp date
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1' ) LOCATION 's3://foo.bar/' TBLPROPERTIES ('has_encrypted_data'='false');
I would like to have a query like:
SELECT * FROM "foo"."test" WHERE foo.usages.usage is null;
When I do that I get:
SYNTAX_ERROR: line 1:53: Expression "foo"."usages" is not of type ROW
If I make my query where I directly index the array as seen in the following it works.
SELECT * FROM "foo"."test" WHERE foo.usages[1].usage is null;
My overall goal though is to query across all items in the usages array and find any row where at least one item in the usages array has a member usage that is null.
Solution 1:[1]
You can achieve this by unnesting the array into rows and then check those for null values. This will result in one row per null-value entry.
select * from test
CROSS JOIN UNNEST(foo.usages) AS t(i)
where i.usage is null
So if you only nee the unique set, you must run this through a select distinct.
select distinct id from test
CROSS JOIN UNNEST(foo.usages) AS t(i)
where i.usage is null
Solution 2:[2]
Athena is based on Presto. In Presto 318 you can use any_match:
SELECT * FROM "foo"."test"
WHERE any_match(foo.usages, element -> element.usage IS NULL);
I think the function is not available in Athena yet, but you can emulate it using reduce.
SELECT * FROM "foo"."test"
WHERE reduce(
foo.usages, -- array to reducing
false, -- initial state
(state, element) -> state OR element.usage IS NULL, -- combining function
state -> state); -- output function (identity in this case)
Solution 3:[3]
Another way to emulate any_match(<array>, <function>) is with cardinality(filter(<array>, <function>)) > 0.
SELECT * FROM "foo"."test"
WHERE any_match(foo.usages, element -> element.usage IS NULL);
Becomes:
SELECT * FROM "foo"."test"
WHERE cardinality(filter(foo.usages, element -> element.usage IS NULL)) > 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jens walter |
| Solution 2 | |
| Solution 3 | Steven |
