'What is ORC double datatype equivalent in Redshift?
I have ORC files that have columns with double
datatype in the file, these columns are queryable in AWS Athena as numeric(18,0). This is the best I could find on the byte length of the destination Redshift datatype: https://docs.aws.amazon.com/redshift/latest/dg/r_Numeric_types201.html. I tried float4 and float8 but those did not work.
ERROR: Spectrum Scan Error Detail:
-----------------------------------------------
error: Spectrum Scan Error code: 15007 context:
In file https://s3.us-east-1.amazonaws.com/....zlib.orc declared column type DECIMAL for column <test_column> incompatible with
ORC file column type double query: 40933 location: dory_util.cpp:1167 process: worker_thread [pid=1299]
-----------------------------------------------
[ErrorId: 1-6233d72e-4401a9ae4a9f92432ebc9fcf]
Table schema
CREATE TABLE "schema"."table" (
col1 float,
col2 decimal(18,0) encode az64, # FAILS source ORC - double
col3 float4, # FAILS source ORC - double
# col4 numeric(18,0) encode az64, # AWS Glue representation source ORC - double
col5 character varying(256) encode lzo
);
Code that fails:
COPY "schema"."table"
FROM 's3://.../database/table/' IAM_ROLE 'arn:aws:iam::123456789:role/TestIAM'
FORMAT AS ORC
Solution 1:[1]
Instead of using decimal data type, Use double precision as it is standard data type of Redshift for decimal or float values
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ashutosh Sharma |