'Hive and HQL: how do you format JSON strings to be sorted by keys
in Hive/HQL, how do you format JSON strings to be ordered by their keys? For example
| id | some_str |
|---|---|
| 1 | {"b":1, "c":2, "a":0} |
I want the output to be ordered by the json keys (i.e. a<b<c):
| id | some_str |
|---|---|
| 1 | {"a":0, "b": 1, "c":2} |
I know I might be able to use get_json_object and do some hard-coded formatting, but that would only work if I have only a few keys, but unfortunately that doesn't apply here.
What would you suggest? Thanks!!
Additional Q's:
How do I check equality of
{"b":1, "c":2, "a":0}and{"a":0, "b": 1, "c":2}(their equality should be True)?How do I get the value of the largest/smallest key? i.e. expected results:
| id | some_str |
|---|---|
| 1 | 0 |
(smallest key = "a")
or
| id | some_str |
|---|---|
| 1 | 2 |
(largest key = "c")
Solution 1:[1]
Hive does not support JSON data type, JSON can be parsed by JSONSerDe or get_json_object/json_tuple. If the data in your question were of type map, then it would be sortable/comparable based on keys/values. Strings are comparable as normal strings, no matter is it JSON is inside or not JSON.
Consider storing it as map<string,int> or map<string,string> type.
Also you can convert it to map using str_to_map function (returns map<string, string>) and regexp_replace to remove quotes, spaces, curly braces, then use IN operator to compare. See this demo:
with mydata as (
select '{"b":1, "c":2, "a":0}' as A, '{"a":0, "b": 1, "c":2}' as B
)
select A, B, A in (B), B in (A) from
(
select str_to_map(regexp_replace(regexp_replace(regexp_replace(A,': +',':'),', +',','),'"|\\{|\\}','')) A,
str_to_map(regexp_replace(regexp_replace(regexp_replace(B,': +',':'),', +',','),'"|\\{|\\}','')) B
from mydata
)s
Result:
a b a_equal_b b_equal_a
{"a":"0","b":"1","c":"2"} {"a":"0","b":"1","c":"2"} true true
Both maps are equal, note they are displayed in the same keys order (a<b<c). After conversion to map, you can order, compare and easily extract keys, values and convert values to int if necessary.
Also you can convert JSON string to map specifying types for key and value using brickhouse json_map function, without need to additionally transform JSON string using regexp_replace, this is the most efficient method:
json_map(' {"b":1, "c":2, "a":0}' as A, '{"a":0, "b": 1, "c":2}', 'string,int')
Read how to install brickhouse functions here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
