'How to fix "File file /tmp/delta-table does not exist in Delta Lake?
Hello dear programmers,
I am currently setting up Delta Lake with Apache Spark. For the spark worker and master I am using the image docker.io/bitnami/spark:3.
What I am trying to do is via my python application is creating a new table of type delta via the spark master/worker I setup. However when I try to save the table I get the following error: File file:/tmp/delta-table/_delta_log/00000000000000000000.json does not exist.
This might have something to do with the worker/master container not being able to access my local files however I am not sure how to fix this. I also looked into using HDFS but should I be running a separate server for this because it seems to be built into Delta Lake already?
The code of my applications looks as follows:
import pyspark
from delta import *
builder = pyspark.sql.SparkSession.builder.appName("MyApp") \
.master("spark://spark:7077") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.config("spark.jars.packages", "io.delta:delta-core_2.12:1.1.0") \
spark = configure_spark_with_delta_pip(builder).getOrCreate()
data = spark.range(0, 5)
data.write.mode("overwrite").format("delta").save("/tmp/delta-table")
df = spark.read.format("delta").load("/tmp/delta-table")
df.show()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
