'Can 2 Spark job use a single HDFS/S3 storage simultaneously?
I'm a beginner in Spark. Can I have 2 spark jobs to use a single HDFS/S3 storage at the same time? One job will write latest data to S3/HDFS and other will read that along with input data from another source for analysis.
Solution 1:[1]
In order to use both file systems, you need to include the protocol for the files.
e.g. spark.read.path("s3a://bucket/file") and/or spark.write.path("hdfs:///tmp/data")
However, you can use S3 directly in place of HDFS via setting fs.defaultFS
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
