'Snowflake "Invalid UTF8 detected in string SNOWFLAKE" using PUT
I've got a csv file with this content:
"Compañía","Aeropuerto Base","Año","Clase","Grupo Compañía","Mes","Movimiento","País","Servicio","Tipo Avión","Tipo Tráfico","Operaciones Totales"
"2 EXCEL AVIATION LTD","ADOLFO SUÁREZ MADRID-BARAJAS","2020","UE SCHENGEN","Total","","","","","","","0"
"2 EXCEL AVIATION LTD","ADOLFO SUÁREZ MADRID-BARAJAS","2020","UE NO SCHENGEN","Total","","","","","","","4"
"2 EXCEL AVIATION LTD","ADOLFO SUÁREZ MADRID-BARAJAS","2020","INTERNACIONAL","Total","","","","","","","2"
I've uploaded it to snowflake using the stage feature:
PUT 'file://C:\\tmp\\opc2020.csv' @demo_stage;
I've created a file format:
CREATE OR REPLACE FILE FORMAT demo_file_format TYPE = 'CSV' field_delimiter = ',';
If I try to query the content:
SELECT C.$1 FROM @demo_stage (file_format => 'demo_file_format') C
I get an error:
SQL Error [100144] [22000]: Invalid UTF8 detected in string '0xFF0xFE"0x00C0x00o0x00m0x00p0x00a0x000xF10x000xED0x00a0x00"0x00'
File 'opc2020.csv.gz', line 1, character 1
Row 1, column "TRANSIENT_STAGE_TABLE"["$1":1]
If I add the VALIDATE_UTF8 = false attribute then I can query the stage but losing the UTF8 characters and with some unexpected whitespace between characters:
CREATE OR REPLACE FILE FORMAT dbt_demo_file_format TYPE = 'CSV' field_delimiter = ',' VALIDATE_UTF8 = TRUE;
��" C o m p a � � a " |
" 2 E X C E L A V I A T I O N L T D "
" 2 E X C E L A V I A T I O N L T D "
" 2 E X C E L A V I A T I O N L T D "
" 2 E X C E L A V I A T I O N L T D "
" 2 E X C E L A V I A T I O N L T D "
" 2 E X C E L A V I A T I O N L T D "
" 2 E X C E L A V I A T I O N L T D "
How can I solve this?
Solution 1:[1]
If the original file was generated using a different character set than UTF-8 then you would get into this issue.
If you know the character set used to generate the file then you can set into the FILE FORMAT statement the ENCODING parameter to the correct value.
In your case if the original file was created using UTF-16LE then your CREATE FILE FORMAT would look like:
CREATE OR REPLACE FILE FORMAT demo_file_format TYPE = 'CSV' field_delimiter = ',' ENCODING = 'UTF-16 LE' VALIDATE_UTF8 = TRUE FIELD_OPTIONALLY_ENCLOSED_BY = '"';
More information on character sets and encoding is on our docs here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Sergiu |
