'How do I override the couchbase query timeout in aws glue job?
I'm trying to run a Couchbase query in an aws glue job, using the spark couchbase connector. My query is a simple N1Ql query against the existing Couchbase bucket:
var queryResultRDD: RDD[CouchbaseQueryRow] = spark.sparkContext.couchbaseQuery(N1qlQuery.simple(CbN1qlQuery))
If I have a long running query, I receive the following error message:
Caused by: com.couchbase.client.java.error.QueryExecutionException: Timeout 1m15s exceeded
at com.couchbase.spark.connection.QueryAccessor$$anonfun$compute$1$$anonfun$apply$2$$anonfun$4.apply(QueryAccessor.scala:56)
at com.couchbase.spark.connection.QueryAccessor$$anonfun$compute$1$$anonfun$apply$2$$anonfun$4.apply(QueryAccessor.scala:53)
at rx.lang.scala.Observable$$anon$32.call(Observable.scala:1324)
The 1m15s timeout setting likely comes from Couchbase's 75s default query timeout, so I tried to add the query timeout setting directly to the query call hoping that it would override the default timeout setting:
var queryResultRDD: RDD[CouchbaseQueryRow] = sc.couchbaseQuery(N1qlQuery.simple(CbN1qlQuery), "bucket-name", Some(Duration(130, SECONDS)))
Dropping that duration down to something impossibly low like 1 ms resulted in a different query timeout error. However, if I made the duration longer as shown above I still received the same QueryExecutionException where the timeout was still 1m15s. I also tried to set the System Property in the Glue job script:
System.setProperty("com.couchbase.env.timeout.queryTimeout", "1ms")
Yet I still received the same 1m15s timeout error. I also tried setting the spark.couchbase.timeout.queryTimeout property in the same way and got the same result. I also tried setting the query timeout configuration in the sparkSession builder with no change:
val Spark = SparkSession
.builder()
.appName(DefaultName)
.config("spark.couchbase.nodes", CbNodes)
.config(s"spark.couchbase.bucket.$SourceBucketName", SourceBucketPassword)
.config("spark.couchbase.username", SourceBucketUserName)
.config("spark.couchbase.password", SourceBucketPassword)
.config("spark.ssl.enabled", CbSslEnabled)
.config("spark.ssl.keyStore", CbKeyStore)
.config("spark.ssl.keyStorePassword", CbKeyStorePassword)
.config("spark.couchbase.timeout.queryTimeout", "1ms")
.getOrCreate()
How do I override this 1m15s query timeout setting?
Solution 1:[1]
Ok after quite some twiggeling I finally got something to work.
import numpy as np
import io
import tarfile
import imageio
import time
#create dummy image
im = np.zeros((256,256))
#create memory file
out_file = io.BytesIO()
imageio.imsave(out_file, im, format = 'jpg')
#create memory tar
tarBuffer = io.BytesIO()
#write memory file to memory tar
t = tarfile.TarInfo("helloworld.tif")
t.size = len(out_file.getbuffer())
with tarfile.open("foo.tar", mode="w:gz", fileobj= tarBuffer) as tar:
tar.addfile(t, io.BytesIO(out_file.getbuffer()))
#write memory tar to disk
tarBuffer.seek(0, 0)
with open('out.tar', 'wb') as dump:
dump.write(tarBuffer.read())
#close the files
out_file.close()
tarBuffer.close()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Daan |
