'Non-Partitioned Table Schema not updated with Glue ETL Job

We have an ETL job that uses the below code snippet to update the catalog table:

sink = glueContext.getSink(connection_type='s3', path=config['glue_s3_path_bc'], enableUpdateCatalog=True, updateBehavior='UPDATE_IN_DATABASE')
sink.setFormat('glueparquet')
sink.setCatalogInfo(catalogDatabase=config['glue_db'], catalogTableName=config['glue_table_bc'], catalogId=args['catalog_id'])
sink.writeFrame(dyF)

The table is non-partitioned & needs to be overwritten with new data daily. Since glueContext does not support overwrite, we are using purge_s3_path & purge_table methods to empty the S3 Location a step before using the above write. We do similar thing for partitioned tables as well & it has been working fine for us so far.

Recently, the schema of the data was updated (added a few new columns). Upon the ETL job completion, it successfully updated the partitioned Table with the new schema but the non-partitioned schema is still the same. We did verify by physically accessing the S3 files & the new fields are present in the datafiles. Why is the schema not updated similar to the partitioned Table? Is there a different method that we can use?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source