'PySpark External Table all fields are empty

I've created a new df2:

schema2 = "Product STRING, Quantity INT"
data4 = [('Pen', 9),('Pineappple', 22),('Apple', 12),('Pen', 13)]

df2 = spark.createDataFrame(data4, schema2)

Next, I would like to turn it into an External Table (Unmanaged):

df2.write.format('parquet').save("ExternalTables/ppap_nm") 

df2.write.mode('overwrite').option("path", "ExternalTables/ppap_nm").saveAsTable("ppap_nm")

Now, if I try to select data from our newly created table ppap_nm I get the following WARN.

spark.sql("select * from ppap_nm").show()
+-------+--------+
|Product|Quantity|
+-------+--------+
+-------+--------+

22/03/11 14:17:36 WARN HadoopFSUtils: The directory file:/home/corbanez/Documents/PySpark/spark-warehouse/desp.db/ExternalTables/ppap_nm was not found. Was it deleted very recently?

What am I doing wrong? Should I try another method for table creation?



Solution 1:[1]

  1. Path must be absolute (You should have gotten an error for that)
  2. Only one write operation is needed

schema2 = "Product STRING, Quantity INT"
data4 = [('Pen', 9),('Pineappple', 22),('Apple', 12),('Pen', 13)]
df2 = spark.createDataFrame(data4, schema2)

df2.write.mode('overwrite').option("path", "/ExternalTables/ppap_nm").format('parquet').saveAsTable("ppap_nm")

spark.table('ppap_nm').show()

+----------+--------+
|   Product|Quantity|
+----------+--------+
|Pineappple|      22|
|     Apple|      12|
|       Pen|      13|
|       Pen|       9|
+----------+--------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 David דודו Markovitz