'Mapping mixed case Parquet to Snowflake with ADF
Trying to solve an issue with mapping mixed case fields in a Parquet file to Snowflake, as it stores column names in uppercase.
I'm using Azure Data Factory's copy activity to load a Parquet file to Snowflake. For example, my parquet schema is (ID, CustomerName, CustomerType). When loading to Snowflake, I get the ID column populated, but nothing in CustomerName/CustomerType.
Keeping in mind that my source casing can unexpectedly change (e.g. CustomerName -> customername), has anyone seen and solved this issue before?
Solution 1:[1]
If your parquet is already in blob, you should be able to copy them directly into a Snowflake table with the
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
copy option. https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
Example:
copy into example_table from @example_stage/table_folder/
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE -- match parquet column names to snowflake table column names
PATTERN = '.*parquet' -- optional, pattern to only copy files that end in "parquet"
PURGE = TRUE -- optional, will delete successfully copied files from blob (if privlaged to do so)
;
ADF now has the script activity so you can orchestrate this there.
Side Note: Snowflake has schema detection for parquet, so you can even generate tables directly from your parquet files as well.
Solution 2:[2]
Snowflake connector utilizes Snowflake’s COPY into [table] command to achieve the best performance. It supports writing data to Snowflake on Azure.
Property Description Required importSettings Advanced settings used to write data into Snowflake. You can configure the ones supported by the COPY into command that the service will pass through when you invoke the statement. No
Option: MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | cleveralias |
| Solution 2 | Lukasz Szozda |
