'How to send data from S3 to redshift without duplicating and truncating table By LAMBDA?
I wish to copy data from S3 to Redshift.
However, the Copy command always duplicates the rows whenever the Lambda function triggers:
cur.execute('copy table from S3...... ' )
Can someone suggest other ways to do it without truncating existing data?
for commenters: I tried to push directly from the dataframe to redshift.. append
There is one library pandas_redshift but it needs s3 connection first which might solve appending issue)
I also tried
#if cur.execute('truncate') it can keep the table empty but I don't have delete rights
cur.execute('select distinct * from ABC.xyz')
cur.execute('copy......')
results keep appending...
Can someone please provide any code or right series of execution.
Solution 1:[1]
Unfortunately there is no straight forward option to copy the files to perform upsert which can handle duplicates.
If you don't want to truncate the table, there are two workarounds:
You can create a staging table where you can copy the data first and then perform merge option. That will also act as upsert statement.
https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html
You can use manifest to control which files you want to copy and which needs to be avoided. https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html#copy-command-examples-manifest
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Nishu Tayal |
