'AWS Glue preactions: Invalid operation: relation "stage_table" does not exist
I'm trying to utilize AWS Glue dynamic frame writer preactions and postactions to perform a redshift upsert merge as described here: https://aws.amazon.com/premiumsupport/knowledge-center/sql-commands-redshift-glue-job/
I've pretty much just copy/pasted the code from that page into my own Glue job, except I'm using glueContext.write_dynamic_frame.from_options() instead of glueContext.write_dynamic_frame.from_jdbc_conf().
It almost works, but everytime I run the job, it errors out during the writing of the dynamic frame with this error:
Amazon Invalid operation: relation "stage_table" does not exist
Here is the code that causes this error (the db table/column names have been changed for simplicity):
pre_query = 'begin; \
drop table if exists stage_table; \
create table stage_table as select * from target_table where 1=2; \
end;'
post_query = 'begin; \
delete from target_table using stage_table \
where target_table.identifier = stage_table.identifier; \
insert into target_table select * from stage_table; \
drop table stage_table; \
end;'
glueContext.write_dynamic_frame.from_options(
frame=dy_out,
connection_type='redshift',
connection_options={
'url': dburl,
'preactions': pre_query,
'dbtable': 'stage_table',
'database': 'public',
'user': dbuser,
'password': dbpassword,
'redshiftTmpDir': args['TempDir'],
'postactions': post_query
}
)
It appears that the error message comes from the call to write_dynamic_frame.from_options(). In fact, if I manually add a stage_table to my database and then run the job, when it's finished, the stage_table no longer exists. So it seems like the preactions drop table is working, but the create table is not, perhaps due to some sort of glue timing issue.
Note that I have, for the moment, hacked my way around this problem by:
1. creating a permanent version of the stage_table in my redshift db,
2. changing my preaction to just truncate the stage_table instead of dropping and creating it, and
3. changing my postaction to not drop the stage_table.
In other words, the following code DOES WORK, proving the functionality of the pre and post actions. But I would rather not have the stage tables residing permanently in my db.
pre_query = 'truncate table stage_table;'
post_query = 'begin; \
delete from target_table using stage_table \
where stage_table.identifier = target_table.identifier; \
insert into target_table select * from stage_table; \
end;'
glueContext.write_dynamic_frame.from_options(
frame=dy_out,
connection_type='redshift',
connection_options={
'url': dburl,
'preactions': pre_query,
'dbtable': 'stage_table',
'database': 'public',
'user': dbuser,
'password': dbpassword,
'redshiftTmpDir': args['TempDir'],
'postactions': post_query
}
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
