'Snowflake - Escape backslash and double quotes
I have a csv data file in blob and I am trying to create an external table on it. Here's how the file looks like in text editor -
"ABC"|"123"|"ID1"
"XYZ"|"456"|"ID2"
When I create an external table with pipe as delimiter and then query the individual columns, I see these values -
value:c1|value:c2|value:c3
"\"ABC\"" "\"123\"" "\"ID1\""
"\"XYZ\"" "\"345\"" "\"ID2\""
Is there something that I need to add to field_optionally_enclosed_by option? I am easily able to do
select replace(value:c1,'\\"'),replace(value:c2,'\\"'),replace(value:c3,'\\"') from testable
and this gives me back
value:c1|value:c2|value:c3
"ABC" "123" "ID1"
"XYZ" "345" "ID2"
Any suggestions?
Solution 1:[1]
Please try below and see if it helps, it works in my test:
create or replace external table external_table
with location = @test.ericlin_s3_stage/t1/
auto_refresh = true
REFRESH_ON_CREATE = true
FILE_FORMAT = (
TYPE=CSV
COMPRESSION=NONE
REPLACE_INVALID_CHARACTERS = TRUE
FIELD_DELIMITER = '|'
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
ESCAPE_UNENCLOSED_FIELD = '\\'
);
select
value:c1, value:c2, value:c3
from external_table;
+----------+----------+----------+
| VALUE:C1 | VALUE:C2 | VALUE:C3 |
|----------+----------+----------|
| "ABC" | "123" | "ID1" |
| "XYZ" | "456" | "ID2" |
+----------+----------+----------+
select
value:c1::string as c1,
value:c2::string as c2,
value:c3::string as c3
from external_table;
+-----+-----+-----+
| C1 | C2 | C3 |
|-----+-----+-----|
| ABC | 123 | ID1 |
| XYZ | 456 | ID2 |
+-----+-----+-----+
Solution 2:[2]
If your data is always fixed width, and you are want independent values for writting to the external table, then SPLIT and a nested select could help here:
SELECT
get(s,0) as a,
get(s,1) as b,
get(s,2) as c
FROM (
SELECT split(column1,'|') as s
FROM VALUES
('"ABC"|"123"|"ID1"'),
('"XYZ"|"456"|"ID2"')
);
gives:
| A | B | C |
|---|---|---|
| ""ABC"" | ""123"" | ""ID1"" |
| ""XYZ"" | ""456"" | ""ID2"" |
this look funning because the output layer has escaped the inner double quotes. So you could use TRIM to drop those:
SELECT
trim(get(s,0),'"') as a,
trim(get(s,1),'"') as b,
trim(get(s,2),'"') as c
FROM (
SELECT split(column1,'|') as s
FROM VALUES
('"ABC"|"123"|"ID1"'),
('"XYZ"|"456"|"ID2"')
);
| A | B | C |
|---|---|---|
| ABC | 123 | ID1 |
| XYZ | 456 | ID2 |
and if you dislike the sub-select you can use SPLIT_PART
SELECT
trim(split_part(column1,'|',0),'"') as a,
trim(split_part(column1,'|',1),'"') as b,
trim(split_part(column1,'|',2),'"') as c
FROM VALUES
('"ABC"|"123"|"ID1"'),
('"XYZ"|"456"|"ID2"')
;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Simeon Pilgrim |
