Category "aws-glue"

How to copy data from Amazon S3 to DDB using AWS Glue

I am following AWS documentation on how to transfer DDB table from one account to another. There are two steps: Export DDB table into Amazon S3 Use a Glue job t

Non-Partitioned Table Schema not updated with Glue ETL Job

We have an ETL job that uses the below code snippet to update the catalog table: sink = glueContext.getSink(connection_type='s3', path=config['glue_s3_path_bc']

Having trouble setting up multiple tables in AWS glue from a single bucket

So, I've used Glue before, but it's been with a single file <> single folder relationship. What I'm trying to do now is to have a structure like this crea

Spark Catalog w/ AWS Glue: database not found

Ive created an EMR cluster with the Glue Data catalog. When I invoke the spark-shell, I am able to successfully list tables stored within a Glue database via s

Not able to populate AWS Glue ETL Job metrics

I am trying to populate maximum possible Glue job metrics for some testing, below is the setup I have created: A crawler reads data (dummy customer data of 500

exclusions doesn't work in AWS Glue ELT job s3 connection

According to AWS Glue documentation, we can use exlusions to exclude files when the connection type is s3: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-

Trino iceberg connector "getTablesWithParameter for GlueHiveMetastore is not implemented"

I'm running trino on EMR version 6.5 and I have added the iceberg connector for the trino and I want it to use a glue catalog. These are the configuration under

AWS Glue Jupyter Notebook Failed to authenticate user

When I started job with IAM Role AWSGlueServiceNotebookRoleDefault I have this error: Failed to authenticate user due to missing information in request. No info

AWS Glue Crawler Not Creating Table

I have a crawler I created in AWS Glue that does not create a table in the Data Catalog after it successfully completes. The crawler takes roughly 20 seconds

AWS Glue Job extracts columns that are not present in Catalog table

Looks like my earlier post was not clear. Here is what am looking for, I have an aws glue catalog table consisting of 29 columns. Source table with 31 columns.

Load data from S3 into Aurora Serverless using AWS Glue

According to Moving data from S3 -> RDS using AWS Glue I found that an instance is required to add a connection to a data target. However, my RDS is a serve

Redshift Spectrum table doesnt recognize array

I have ran a crawler on json S3 file for updating an existing external table. Once finished I checked the SVL_S3LOG to see the structure of the external table a