'Inserting streaming data into an Amazon Redshift cluster
I am trying to direct insert sparkstream data into an Amazon Redshift cluster but am not able to find right way.
Below is the code that i got but its first inserting into S3 then copying to Redshift:.
#REDSHIFT_JDBC_URL = "jdbc:redshift://%s:5439/%s" % (REDSHIFT_SERVER, DATABASE)
df.write \
.format("com.databricks.spark.redshift") \
.option("url", REDSHIFT_JDBC_URL) \
.option("dbtable", TABLE_NAME) \
.option("tempdir", "s3n://%s:%s@%s" % (ACCESS_KEY, SECRET, S3_BUCKET_PATH)) \
.mode("overwrite") \
.save()
Does it impact streaming or insertion performance?
Or any other way to do it?
Solution 1:[1]
AWS Redshift now support streaming insert natively through Kinesis Data Streams. No need of intermediate S3 staging. Feature is now in public preview
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | faisal_kk |
