'How to parse json in snowflake to get the count based on certain field in json

[
  {
    "type": "ENTITLEMENT",
  },
  {
    "type": "ENTITLEMENT",
  },
  {
    "type": "ROLE"
  },
  {
    "type": "ACCESS_PROFILE"
  }
]

I have a column with the above json value in snowflake. I trying to get the count for a particular type. For example , I want the counts for type = ENTITLEMENT. This should be 2.

I could flatten this data. But this would produce a lot of duplicate data for columns ( i have a table with large number of columns).

Looking for ways to parse this json.

Things I have tried.

  1. Write a java udf to parse the the content of example
REATE OR REPLACE FUNCTION IDN_DATA.entitlement_counter(access array)
RETURNS NUMBER
    LANGUAGE java handler = 'JsonCounterWithFilter.entitlement_counter'
as
$$

public class JsonCounterWithFilter {

    public int entitlement_counter(String[] access) {
        int counter = 0;
        for(String acc :access) {
            if(acc.contains("ENTITLEMENT")) {
                counter++;
            }
        }
        return counter;
    }
}
$$;

this does not work as it would looks for the word entitlement anywhere, not just in the field type. I dont have access jackson library to parse this json.



Solution 1:[1]

A short JS UDF can do this count:

create or replace function count_object_in_array(A array, T string, V string)
returns string
language javascript
as
$$
return A.reduce((count, x) => count + (x[T] == V?1:0), 0)
$$;

For example:

create or replace temp table stst as
select parse_json('[
  {
    "type": "ENTITLEMENT",
  },
  {
    "type": "ENTITLEMENT",
  },
  {
    "type": "ROLE"
  },
  {
    "type": "ACCESS_PROFILE"
  }
]') a;


select *, count_object_in_array(a, 'type', 'ENTITLEMENT')
from stst;

Solution 2:[2]

Here's a pure SQL approach:

create or replace temp table T1 as

select parse_json($$
                  
[
  {
    "type": "ENTITLEMENT",
  },
  {
    "type": "ENTITLEMENT",
  },
  {
    "type": "ROLE"
  },
  {
    "type": "ACCESS_PROFILE"
  }
]
                  
 $$) COL1;
                                        
select  VALUE:type::string as TYPE
       ,count(*) as CT
from t1, table(flatten(t1.col1))
group by TYPE

Solution 3:[3]

Lateral flatten should not produce dups (at least not based on your input). You can do an aggregate within the native json commands. See if this will work for your data

WITH JSON_DATA AS(
SELECT 
  PARSE_JSON('[
  {
    "type": "ENTITLEMENT",
  },
  {
    "type": "ENTITLEMENT",
  },
  {
    "type": "ROLE"
  },
  {
    "type": "ACCESS_PROFILE"
  }
]' ) as json)
         SELECT  value:type::string as type, count(*) FROM JSON_DATA, lateral flatten(input => json) 
         GROUP BY type;

results:

TYPE COUNT(*)
ENTITLEMENT 2
ROLE 1
ACCESS_PROFILE 1

Solution 4:[4]

So if we have "more" data as you imply you have than the single demo line:

with data as (
    select column1 as id, parse_json(column2) as json from values 
    ('a', '[ {"type":"ENTITLEMENT"},{"type":"ENTITLEMENT"},{"type":"ROLE"},{"type":"ACCESS_PROFILE"}]'),
    ('b', '[ {"type":"ENTITLEMENT"},{"type":"ROLE"},{"type":"ACCESS_PROFILE"}]'),
    ('c', '[ {"type":"ENTITLEMENT"},{"type":"ENTITLEMENT"},{"type":"ENTITLEMENT"},{"type":"ROLE"},{"type":"ACCESS_PROFILE"}]')
)

And then just flatten that as if is just one huge blob of code, you will have "mixed" up values:

SELECT  
    j.value:type::string as type, count(*) 
FROM data as d, lateral flatten(input => d.json) j
GROUP BY type;
TYPE COUNT(*)
ENTITLEMENT 6
ROLE 3
ACCESS_PROFILE 3

the classic answer is to use some "extra" information in the rows like ID to also group by:

SELECT  
    d.id,
    j.value:type::string as type,
    count(*) as count
FROM data as d, lateral flatten(input => d.json) j
GROUP BY 1,2;
ID TYPE COUNT
a ENTITLEMENT 2
a ROLE 1
a ACCESS_PROFILE 1
b ENTITLEMENT 1
b ACCESS_PROFILE 1
c ENTITLEMENT 3
c ROLE 1
c ACCESS_PROFILE 1
b ROLE 1

Anther way if you have no per row value like ID or it's not unique per row is the SEQ element on the FLATTEN

SELECT  
    j.SEQ,
    j.value:type::string as type,
    count(*) as count
FROM data as d, lateral flatten(input => d.json) j
GROUP BY 1,2;
SEQ TYPE COUNT
1 ENTITLEMENT 2
1 ROLE 1
1 ACCESS_PROFILE 1
2 ENTITLEMENT 1
2 ROLE 1
2 ACCESS_PROFILE 1
3 ROLE 1
3 ACCESS_PROFILE 1
3 ENTITLEMENT 3

Or you can just drop the SEQ as a select and just have it in the GROUP BY, but that might confuse people later in a "why do we have duplicates values"

SELECT  
    j.value:type::string as type,
    count(*) as count
FROM data as d, lateral flatten(input => d.json) j
GROUP BY j.seq, 1;
TYPE COUNT
ENTITLEMENT 2
ROLE 1
ACCESS_PROFILE 1
ENTITLEMENT 1
ROLE 1
ACCESS_PROFILE 1
ROLE 1
ACCESS_PROFILE 1
ENTITLEMENT 3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Felipe Hoffa
Solution 2 Greg Pavlik
Solution 3 P Needleman
Solution 4 Simeon Pilgrim