'Snowflake - upload CSV - issue with only one accented character
I have an issue with an accented character when I upload a CSV file and then copy it into a table. the weird thing is that most accented letters are just fine, but one is being replaced by '�' when queried.
Another thing, when I use an INSERT statement, no issue whatsoever.
I use an internal stage. here's the file format definition:
create or replace file format MY_FORMAT
type = csv
record_delimiter = '\n'
field_delimiter = ';'
field_optionally_enclosed_by = '"'
skip_header = 1
null_if = ('NULL', 'null')
empty_field_as_null = true
compression = gzip
validate_UTF8=false
skip_blank_lines = true;
The file built in Excel, saved as CSV UTF-8. No other issues, no errors, all my rows are uploaded, just that one character that's supposed to be a "û" that turns out to be "�".
Any ideas?
Thanks,
JFS.
Solution 1:[1]
It could an issue with the terminal being used. Please try and check in a different terminal or web UI. I simulated the scenario and I get the result as expected. Please refer below -
#### Data contents for so_testfile.csv
id;name
1;"û"
2;"û"
3;"û"
4;"û"
SNOWFLAKE1#COMPUTE_WH@TEST_DB.PUBLIC>create or replace stage so_my_stage file_format=SO_MY_FORMAT;
+----------------------------------------------+
| status |
|----------------------------------------------|
| Stage area SO_MY_STAGE successfully created. |
+----------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.138s
SNOWFLAKE1#COMPUTE_WH@TEST_DB.PUBLIC>put file://c:\snowflake\so_testfile.csv @so_my_stage;
+-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------+
| source | target | source_size | target_size | source_compression | target_compression | status | message |
|-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------|
| so_testfile.csv | so_testfile.csv.gz | 39 | 64 | NONE | GZIP | UPLOADED | |
+-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------+
1 Row(s) produced. Time Elapsed: 1.100s
SNOWFLAKE1#COMPUTE_WH@TEST_DB.PUBLIC>select $1,$2 from @so_my_stage;
+----+----+
| $1 | $2 |
|----+----|
| 1 | û |
| 2 | û |
| 3 | û |
| 4 | û |
+----+----+
4 Row(s) produced. Time Elapsed: 0.308s
SNOWFLAKE1#COMPUTE_WH@TEST_DB.PUBLIC>select * from SO_TEST_TAB;
+----+------+
| ID | COL1 |
|----+------|
+----+------+
0 Row(s) produced. Time Elapsed: 0.388s
SNOWFLAKE1#COMPUTE_WH@TEST_DB.PUBLIC>copy into SO_TEST_TAB from @so_my_stage;
+--------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|--------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| so_my_stage/so_testfile.csv.gz | LOADED | 4 | 4 | 1 | 0 | NULL | NULL | NULL | NULL |
+--------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 0.833s
SNOWFLAKE1#COMPUTE_WH@TEST_DB.PUBLIC>select * from so_test_tab;
+----+------+
| ID | COL1 |
|----+------|
| 1 | û |
| 2 | û |
| 3 | û |
| 4 | û |
+----+------+
4 Row(s) produced. Time Elapsed: 0.263s
SNOWFLAKE1#COMPUTE_WH@TEST_DB.PUBLIC>
Solution 2:[2]
It turns out that coding the file properly as CSV UTF-8 from Excel worked. The "û" is now displayed correctly, just like all others accented letters we have in French.
Thanks for your help.
JFS.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Pankaj |
| Solution 2 | JFS |

