'Aws Glue - Merge SQL on S3 table

I am developing my ETL for the DWH pipeline using AWS GLUE.

I am in the case where my in staging data there are updated rows that need to be merged in my table dimensions.

Example "User" dimension: In the S3 table "Dim_User" I have the user A with the field "team" equals 'Sales'. Today my pipeline has read data from the sources and the AWS Glue job wrote in my S3 table "staging_dim_user" that the user A has 'New Sales Dept' in field "team". Using AWS Glue how can I merge the "Dim_user"? Is it possible to realize my Merge SQL on S3 thought AWS Glue? what are the best practices with AWS GLUE and S3 tables in that case?



Solution 1:[1]

You may need to use Athena to merge those data by query

Athena is unable to merge two sources save in different folders into one table, you might need to use query to merge the data

SELECT * FROM "database"."table_name"
UNION
SELECT * FROM "database"."table_name

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1