'Spark stream with console sink wants HDFS write access
I do have a simple setup of reading from Kafka and writing to local console:
SparkSession
is created with .master("local[*]")
and I start the stream with:
var df = spark.readStream.format("kafka").options(...).load()
df = df.select("some_column")
df.writeStream.format("console")
.outputMode("append")
.start()
.awaitTermination()
The same Kafka setup works perfectly fine when using with batch/normal DataFrame
, but for this streaming job I do get the exception:
Permission denied: user=user, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x
Why does it want access to HDFS, when I want to get the data locally to the console? And how can I solve this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|