'How to fix "File file /tmp/delta-table does not exist in Delta Lake?

Hello dear programmers,

I am currently setting up Delta Lake with Apache Spark. For the spark worker and master I am using the image docker.io/bitnami/spark:3.

What I am trying to do is via my python application is creating a new table of type delta via the spark master/worker I setup. However when I try to save the table I get the following error: File file:/tmp/delta-table/_delta_log/00000000000000000000.json does not exist.

This might have something to do with the worker/master container not being able to access my local files however I am not sure how to fix this. I also looked into using HDFS but should I be running a separate server for this because it seems to be built into Delta Lake already?

The code of my applications looks as follows:

import pyspark
from delta import *

builder = pyspark.sql.SparkSession.builder.appName("MyApp") \
    .master("spark://spark:7077") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .config("spark.jars.packages", "io.delta:delta-core_2.12:1.1.0") \

spark = configure_spark_with_delta_pip(builder).getOrCreate()

data = spark.range(0, 5)
data.write.mode("overwrite").format("delta").save("/tmp/delta-table")

df = spark.read.format("delta").load("/tmp/delta-table")
df.show()


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source