'Creating a file format to deal with a comma in a string
I'm trying to import a csv file into Snowflake. I created a file format as usual but I quickly found out that one of the values had a comma which causing an error.
For example,
1234, John, male, Stack,
1235, Kati, female, Overflow,
1236, Rudy, male, Stack,Overflow
Which results to this
| ID | Name | String1 | String 2| {NULL} |
|----------------------------------------|
|1234| John | male | Stack | |
|1235| Kati | female | Overflow| |
|1236| Rudy | male | Stack |Overflow|
But I want is this:
| ID | Name | String1 | String 2 |
|------------------------------------|
|1234| John | male | Stack |
|1235| Kati | female | Overflow |
|1236| Rudy | male |Stack,Overflow|
I created a file format like this below:
CREATE OR REPLACE FILE FORMAT format_example
TYPE = CSV
COMPRESSION = AUTO
FIELD_DELIMITER = ','
SKIP_HEADER = 1
DATE_FORMAT = "DD-MON-YYYY"
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
EMPTY_FIELD_AS_NULL = TRUE
What should I do? Is there a way for me get the desired outcome? Or is there a way to wrap the strings in quotations in Snowflake?
Solution 1:[1]
Since this is a problem with a malformed CSV, if you can't change it just ingest the whole row and then split inside Snowflake:
with data as (
select '1236, Rudy, male, Stack,Overflow' x
union all select '1234, John, male, Stack,'
union all select '1235, Kati, female, Overflow,'
)
select regexp_substr(x, '[^,]*')
, regexp_substr(x, '[^,]+', 1, 2)
, regexp_substr(x, '[^,]+', 1, 3)
, regexp_substr(x, '[^,]*,[^,]*,[^,]*,(.*[^,])', 1,1,'',1)
from data
For more complex cases you could use a CSV parser in JS or Java.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Felipe Hoffa |

