'AWS Glue Job extracts columns that are not present in Catalog table

Looks like my earlier post was not clear. Here is what am looking for, I have an aws glue catalog table consisting of 29 columns. Source table with 31 columns. When I run AWS glue job I was expecting job to extract only columns present in AWS glue catalog table but the job is processing all 31 columns. Why is glue job processing the columns that are not part of catalog table.



Solution 1:[1]

Check whether your settings are set to LOG or UPDATE IN DATABASE, if it is set to LOG and data is in the location, it will not overwrite the schema and drop the unknown columns.

You can create a purge function to clear out the data location if you want a clean set of data every time, or you can set the update behaviour to UPDATE IN DATABASE to modify the schema on subsequent runs.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Hein Van der Merwe