'Using Snowflake and XMLGET to parse an XML file
I am trying to parse out this xml code snippet. I am using the snowflake database trying to create a table. Using XMLGET functions, I get null returns in all fields. I am bit confused on how to get the attribute of a tag, I've done things before and have gotten the value of an object but nothing where the value is in the attribute
<Msg_file>
<Game Number="2020000001">
<Msg_game_info>
<Game_info Game_id="2020000001" Game_date="11/01/2020" Game_time="06:00 PM" />
</Msg_game_info>M
</Game>
</Msg_file>
Solution 1:[1]
It works but indeed is not as straightforward as you might think. Here's an example:
create or replace table xmltest(x variant);
insert into xmltest SELECT PARSE_XML(column1) as X FROM VALUES ('<Msg_file><Game Number="2020000001"><Msg_game_info><Game_info Game_id="2020000001" Game_date="11/01/2020" Game_time="06:00 PM"></Game_info></Msg_game_info></Game></Msg_file>');
The following would work:
select XMLGET(X, 'Game'):"@Number" AS game FROM xmltest; --return 2020000001
select XMLGET(VALUE, 'Game_info'):"@Game_date" AS game FROM xmltest, LATERAL FLATTEN( INPUT => X:"$" ) LIMIT 1; --returns "11/01/2020"
To understand better how XML parsing is done read the following article
Solution 2:[2]
Making assumptions about what is arrays and what is not, thus using this data:
<Msg_file>
<Game Number="2020000001">
<Msg_game_info>
<Game_info Game_id="2020000001" Game_date="11/01/2020" Game_time="06:00 PM" />
<Game_info Game_id="2020000001" Game_date="11/01/2020" Game_time="07:00 PM" />
</Msg_game_info>
</Game>
<Game Number="2020000002">
<Msg_game_info>
<Game_info Game_id="2020000002" Game_date="11/01/2021" Game_time="06:01 PM" />
<Game_info Game_id="2020000002" Game_date="11/01/2021" Game_time="07:01 PM" />
</Msg_game_info>
</Game>
</Msg_file>'
with this SQL:
SELECT
get(g.value, '@Number') as number
,get(m.value, '@Game_id') as Game_id
,to_date(get(m.value, '@Game_date')) as Game_date
,to_time(get(m.value, '@Game_time')) as Game_time
FROM xml_data as x
,lateral flatten(to_array(x.xml:"$")) as g
,lateral flatten(to_array(xmlget(g.value,'Msg_game_info'):"$")) as m
;
we get these results:
| NUMBER | GAME_ID | GAME_DATE | GAME_TIME |
|---|---|---|---|
| 2020000001 | 2020000001 | 2020-11-01 | 18:00:00 |
| 2020000001 | 2020000001 | 2020-11-01 | 19:00:00 |
| 2020000002 | 2020000002 | 2021-11-01 | 18:01:00 |
| 2020000002 | 2020000002 | 2021-11-01 | 19:01:00 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Sergiu |
| Solution 2 | Simeon Pilgrim |
