Category "aws-glue-spark"

AWS Glue - IllegalArgumentException: Duplicate value for path

I have a messy data source where some field values can come in with two different names but should map to one conformed field name on the output. e.g. data sour

Cast Issue with AWS Glue 3.0 - Pyspark

I'm using Glue 3.0 data = [("Java", "6241499.16943521594684385382059800664452")] rdd = spark.sparkContext.parallelize(data) df = rdd.toDF() df.show() df.select(

AWS glue NoClassDefFoundError on job.init()

Trying to debug AWS Glue scripts locally using Glue ETL library. I have installed aws-glue-libs and spark-3.1.1-amzn-0-bin-3.2.1-amzn-3.tgz. When I run job.init

Show Method for Dynamic Frame in AWS glue returns empty field

When I try to use the dyF.show() it returns an empty field, even though I checked the schema and count() and I know the table is populated. I transformed it int

Glue Dynamic Frame Parse text file with ¶ delimiter

I have a text file which look like below. HDR¶20200101 BDY¶1¶Jimmy BDY¶1¶Something TRL¶123 I would like to parse it to a Glue Dyn

Non-Partitioned Table Schema not updated with Glue ETL Job

We have an ETL job that uses the below code snippet to update the catalog table: sink = glueContext.getSink(connection_type='s3', path=config['glue_s3_path_bc']