'Inserting streaming data into an Amazon Redshift cluster

I am trying to direct insert sparkstream data into an Amazon Redshift cluster but am not able to find right way.

Below is the code that i got but its first inserting into S3 then copying to Redshift:.

#REDSHIFT_JDBC_URL = "jdbc:redshift://%s:5439/%s" % (REDSHIFT_SERVER, DATABASE)

df.write \
    .format("com.databricks.spark.redshift") \
    .option("url", REDSHIFT_JDBC_URL) \
    .option("dbtable", TABLE_NAME) \
    .option("tempdir", "s3n://%s:%s@%s" % (ACCESS_KEY, SECRET, S3_BUCKET_PATH)) \
    .mode("overwrite") \
    .save()

Does it impact streaming or insertion performance?

Or any other way to do it?



Solution 1:[1]

AWS Redshift now support streaming insert natively through Kinesis Data Streams. No need of intermediate S3 staging. Feature is now in public preview

https://aws.amazon.com/about-aws/whats-new/2022/02/amazon-redshift-public-preview-streaming-ingestion-kinesis-data-streams/

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 faisal_kk