'Dealing with unstructured data in snowflake

I have unstructured data in a column in the format

[{
  "M": {
    "accountNumber": {
      "S": "1000022"
    },
    "orgUrns": {
      "L": [{
        "S": "cmp-pp-org-310"
      }, {
        "S": "cmp-pp-org-7096"
      }, {
        "S": "cmp-pp-org-7097"
      }]
    }
  }
}]

From this I want to separate accountNumber and orgUurn into separate column .

Tried many ways but not getting result. Can any one suggest solution?



Solution 1:[1]

putting your data into a CTE:

WITH data(json) as (
    SELECT parse_json(column1) from values
        ('[{"M":{"accountNumber":{"S":"1000022"},"orgUrns":{"L":[{"S":"cmp-pp-org-310"},{"S":"cmp-pp-org-7096"},{"S":"cmp-pp-org-7097"}]}}}]')
)

this sql uses FLATTEN twice to unroll the two arrays (the outer array, and the inner array of L

SELECT 
    f.value:M:accountNumber,
    o.value
FROM data as d,
    table(flatten(input=>d.json)) f,
    table(flatten(input=>f.value:M:orgUrns:L)) o
    ;

gives:

F.VALUE:M:ACCOUNTNUMBER VALUE
{ "S": "1000022" } { "S": "cmp-pp-org-310" }
{ "S": "1000022" } { "S": "cmp-pp-org-7096" }
{ "S": "1000022" } { "S": "cmp-pp-org-7097" }

which could have those final value access cleaned up a little like:

SELECT 
    f.value:M:accountNumber:S::number as accountNumber ,
    o.value:S::text as orgUrn
FROM data as d,
    table(flatten(input=>d.json)) f,
    table(flatten(input=>f.value:M:orgUrns:L)) o
ACCOUNTNUMBER ORGURN
1000022 cmp-pp-org-310
1000022 cmp-pp-org-7096
1000022 cmp-pp-org-7097

Re Comment:

So the code to use ether need to include the CTE:

WITH data(json) as (
    SELECT parse_json(column1) from values
        ('[{"M":{"accountNumber":{"S":"1000022"},"orgUrns":{"L":[{"S":"cmp-pp-org-310"},{"S":"cmp-pp-org-7096"},{"S":"cmp-pp-org-7097"}]}}}]')
)
SELECT 
    f.value:M:accountNumber:S::number as accountNumber ,
    o.value:S::text as orgUrn
FROM data as d,
    table(flatten(input=>d.json)) f,
    table(flatten(input=>f.value:M:orgUrns:L)) o
    ;

Or you have to create a table, which takes time as I have to clean it up

CREATE TABLE data(json variant);

INSERT INTO data select parse_json('[{"M":{"accountNumber":{"S":"1000022"},"orgUrns":{"L":[{"S":"cmp-pp-org-310"},{"S":"cmp-pp-org-7096"},{"S":"cmp-pp-org-7097"}]}}}]');

SELECT 
    f.value:M:accountNumber:S::number as accountNumber ,
    o.value:S::text as orgUrn
FROM data as d,
    table(flatten(input=>d.json)) f,
    table(flatten(input=>f.value:M:orgUrns:L)) o
    ;

Or you put the JSON parsing into the middle of the SQL and that also can be confusing.

SELECT 
    f.value:M:accountNumber:S::number as accountNumber ,
    o.value:S::text as orgUrn
FROM table(flatten(input=>parse_json('[{"M":{"accountNumber":{"S":"1000022"},"orgUrns":{"L":[{"S":"cmp-pp-org-310"},{"S":"cmp-pp-org-7096"},{"S":"cmp-pp-org-7097"}]}}}]'))) f,
    table(flatten(input=>f.value:M:orgUrns:L)) o
    ;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1