'Spark submit --files not working to copy truststore file to worker nodes in google dataproc

I am trying to submit a spark job using 'gcloud dataproc jobs submit spark'. To connect to ES cluster I need to pass the truststore path.

The job is successful if I copy the truststore file to all the worker nodes and give the absolute path as below:

esSparkConf.put("es.net.ssl.truststore.location","file:///tmp/trust.jks");

But I don't want to do like this. If the worker nodes are more copying to each node is difficult.
I tried to pass the truststore file using --files option like below:

gcloud dataproc jobs submit spark --cluster=sprk-prd1  --region=<> --files=trust.jks --class=ESDumpJob --jars=gs://randome/jars/ESDump-jar-with-dependencies.jar

code snipped in ESDumpJob:

SparkConf sparkConf = new SparkConf(true).setAppName("My ES job");
sparkConf.set("spark.es.nodes.wan.only","true")
.set("spark.es.nodes", <es_nodes>)
                    .set("spark.es.net.ssl","true")
                    .set("spark.es.net.ssl.truststore.location","trust.jks"))
                    .set("spark.es.net.ssl.truststore.pass", "pass"))
                    .set("spark.es.net.http.auth.user","test")
                    .set("spark.es.net.http.auth.pass", "test"));

sparkSession = SparkSession
                    .builder().master("local")
                    .config(sparkConf)
                    .config("spark.scheduler.mode", "FAIR")
                    .getOrCreate();

JavaRDD<MyData> data = //create rdd
JavaEsSpark.saveToEs(data, "my_index", ImmutableMap.of("es.mapping.id", "id"));

I am getting below error in this case

17:15:42 Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Expected to find keystore file at [trust.jks] but was unable to. Make sure that it is available on the classpath, or if not, that you have specified a valid URI.
17:15:42    at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadKeyStore(SSLSocketFactory.java:195)
17:15:42    at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadTrustManagers(SSLSocketFactory.java:226)
17:15:42    at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSSLContext(SSLSocketFactory.java:173)


Solution 1:[1]

You need to use org.apache.spark.SparkFiles.get(fileName) to get the actual path, and add the file:// prefix.

sparkConf.set(
    "spark.es.net.ssl.truststore.location",
    "file://" + org.apache.spark.SparkFiles.get("trust.jks"))

See SparkFiles.get and this question.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1