'Dealing with unstructured data in snowflake
I have unstructured data in a column in the format
[{
"M": {
"accountNumber": {
"S": "1000022"
},
"orgUrns": {
"L": [{
"S": "cmp-pp-org-310"
}, {
"S": "cmp-pp-org-7096"
}, {
"S": "cmp-pp-org-7097"
}]
}
}
}]
From this I want to separate accountNumber and orgUurn into separate column .
Tried many ways but not getting result. Can any one suggest solution?
Solution 1:[1]
putting your data into a CTE:
WITH data(json) as (
SELECT parse_json(column1) from values
('[{"M":{"accountNumber":{"S":"1000022"},"orgUrns":{"L":[{"S":"cmp-pp-org-310"},{"S":"cmp-pp-org-7096"},{"S":"cmp-pp-org-7097"}]}}}]')
)
this sql uses FLATTEN twice to unroll the two arrays (the outer array, and the inner array of L
SELECT
f.value:M:accountNumber,
o.value
FROM data as d,
table(flatten(input=>d.json)) f,
table(flatten(input=>f.value:M:orgUrns:L)) o
;
gives:
| F.VALUE:M:ACCOUNTNUMBER | VALUE |
|---|---|
| { "S": "1000022" } | { "S": "cmp-pp-org-310" } |
| { "S": "1000022" } | { "S": "cmp-pp-org-7096" } |
| { "S": "1000022" } | { "S": "cmp-pp-org-7097" } |
which could have those final value access cleaned up a little like:
SELECT
f.value:M:accountNumber:S::number as accountNumber ,
o.value:S::text as orgUrn
FROM data as d,
table(flatten(input=>d.json)) f,
table(flatten(input=>f.value:M:orgUrns:L)) o
| ACCOUNTNUMBER | ORGURN |
|---|---|
| 1000022 | cmp-pp-org-310 |
| 1000022 | cmp-pp-org-7096 |
| 1000022 | cmp-pp-org-7097 |
Re Comment:
So the code to use ether need to include the CTE:
WITH data(json) as (
SELECT parse_json(column1) from values
('[{"M":{"accountNumber":{"S":"1000022"},"orgUrns":{"L":[{"S":"cmp-pp-org-310"},{"S":"cmp-pp-org-7096"},{"S":"cmp-pp-org-7097"}]}}}]')
)
SELECT
f.value:M:accountNumber:S::number as accountNumber ,
o.value:S::text as orgUrn
FROM data as d,
table(flatten(input=>d.json)) f,
table(flatten(input=>f.value:M:orgUrns:L)) o
;
Or you have to create a table, which takes time as I have to clean it up
CREATE TABLE data(json variant);
INSERT INTO data select parse_json('[{"M":{"accountNumber":{"S":"1000022"},"orgUrns":{"L":[{"S":"cmp-pp-org-310"},{"S":"cmp-pp-org-7096"},{"S":"cmp-pp-org-7097"}]}}}]');
SELECT
f.value:M:accountNumber:S::number as accountNumber ,
o.value:S::text as orgUrn
FROM data as d,
table(flatten(input=>d.json)) f,
table(flatten(input=>f.value:M:orgUrns:L)) o
;
Or you put the JSON parsing into the middle of the SQL and that also can be confusing.
SELECT
f.value:M:accountNumber:S::number as accountNumber ,
o.value:S::text as orgUrn
FROM table(flatten(input=>parse_json('[{"M":{"accountNumber":{"S":"1000022"},"orgUrns":{"L":[{"S":"cmp-pp-org-310"},{"S":"cmp-pp-org-7096"},{"S":"cmp-pp-org-7097"}]}}}]'))) f,
table(flatten(input=>f.value:M:orgUrns:L)) o
;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
