'converting a struct to a json when querying athena
I have an athena table which I did not create or manage, but can query. one of the fields is a struct type. for the sake of the example let's suppose it looks like this:
my_field struct<a:string,
b:string,
c:struct<d:string,e:string>
>
Now, I know how to query specific fields within this struct. But in one of my queries I need to extract the complete struct. so I just use:
select my_field from my_table
and the result looks like a string:
{a=aaa, b=bbb, c={d=ddd, e=eee}}
I want to get the result as a json string:
{"a":"aaa", "b":"bbb","c":{"d":"ddd", "e":"eee"}}
this string will then be processed by another application, this is why i need it in json format.
How can I achieve this?
EDIT: Better still, is there a way to query the struct in a way that flattens it? so the result would look like:
a | b | c.d | c.e |
-------------------------------
aaa | bbb | ddd | eee |
Solution 1:[1]
You can directly reference nested fields with a parent_field.child_field notation. Try:
SELECT
my_field,
my_field.a,
my_field.b,
my_field.c.d,
my_field.c.e
FROM
my_table
Solution 2:[2]
We can convert the structs from athena output to objects by Post processing. Below script may help
Assuming sample string received for the nested object
{description=Check the Primary key count of TXN_EVENT table in Oracle, datastore_order=1, zone=yellow, aggregation_type=count, updatedcount=0, updatedat=[2021-06-09T02:03:20.243Z]}
It can be parsed using the help of this npm package athena-struct-parser package.
- Nodejs -- https://www.npmjs.com/package/athena-struct-parser
- Python -- AWS Athena export array of structs to JSON
Sample Code
var parseStruct =require('athena-struct-parser') ;
var str = '{description=Check the Primary key count of TXN_EVENT table in Oracle, datastore_order=1, zone=yellow, aggregation_type=count, updatedcount=0, updatedat=[2021-06-09T02:03:20.243Z]}'
var parseObj = parseStruct(str)
console.log(parseObj);
Result Parsed Output
{
description: 'Check the Primary key count of TXN_EVENT table in Oracle',
datastore_order: '1',
zone: 'yellow',
aggregation_type: 'count',
updatedcount: '0',
updatedat: [ '2021-06-09T02:03:20.004Z' ]
}
Solution 3:[3]
Gave a response to a similar question: AWS Athena export array of structs to JSON
I used a simple approach to get around the struct -> json Athena limitation. I created a second table where the json columns were saved as raw strings. Using presto json and array functions I was able to query the data and return the valid json string to my program:
--Array transform functions too
select
json_extract_scalar(dd, '$.timestamp') as timestamp,
transform(cast(json_extract(json_parse(dd), '$.stats') as ARRAY<JSON>), x -> json_extract_scalar(x, '$.time')) as arr_stats_time,
transform(cast(json_extract(json_parse(dd), '$.stats') as ARRAY<JSON>), x -> json_extract_scalar(x, '$.mean')) as arr_stats_mean,
transform(cast(json_extract(json_parse(dd), '$.stats') as ARRAY<JSON>), x -> json_extract_scalar(x, '$.var')) as arr_stats_var
from
(select '{"timestamp":1520640777.666096,"stats":[{"time":15,"mean":45.23,"var":0.31},{"time":19,"mean":17.315,"var":2.612}],"dets":[{"coords":[2.4,1.7,0.3], "header":{"frame":1,"seq":1,"name":"hello"}}],"pos": {"x":5,"y":1.4,"theta":0.04}}' as dd);
I know the query will take longer to execute but there are ways to optimize.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | James |
| Solution 2 | Ranjithkumar MV |
| Solution 3 | zipate |
