'Spark stream with console sink wants HDFS write access

I do have a simple setup of reading from Kafka and writing to local console:

SparkSession is created with .master("local[*]") and I start the stream with:

var df = spark.readStream.format("kafka").options(...).load()
df = df.select("some_column")

df.writeStream.format("console")
  .outputMode("append")
  .start()
  .awaitTermination()

The same Kafka setup works perfectly fine when using with batch/normal DataFrame, but for this streaming job I do get the exception: Permission denied: user=user, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x

Why does it want access to HDFS, when I want to get the data locally to the console? And how can I solve this?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source