'PySpark External Table all fields are empty
I've created a new df2:
schema2 = "Product STRING, Quantity INT"
data4 = [('Pen', 9),('Pineappple', 22),('Apple', 12),('Pen', 13)]
df2 = spark.createDataFrame(data4, schema2)
Next, I would like to turn it into an External Table (Unmanaged):
df2.write.format('parquet').save("ExternalTables/ppap_nm")
df2.write.mode('overwrite').option("path", "ExternalTables/ppap_nm").saveAsTable("ppap_nm")
Now, if I try to select data from our newly created table ppap_nm I get the following WARN.
spark.sql("select * from ppap_nm").show()
+-------+--------+
|Product|Quantity|
+-------+--------+
+-------+--------+
22/03/11 14:17:36 WARN HadoopFSUtils: The directory file:/home/corbanez/Documents/PySpark/spark-warehouse/desp.db/ExternalTables/ppap_nm was not found. Was it deleted very recently?
What am I doing wrong? Should I try another method for table creation?
Solution 1:[1]
- Path must be absolute (You should have gotten an error for that)
- Only one write operation is needed
schema2 = "Product STRING, Quantity INT"
data4 = [('Pen', 9),('Pineappple', 22),('Apple', 12),('Pen', 13)]
df2 = spark.createDataFrame(data4, schema2)
df2.write.mode('overwrite').option("path", "/ExternalTables/ppap_nm").format('parquet').saveAsTable("ppap_nm")
spark.table('ppap_nm').show()
+----------+--------+
| Product|Quantity|
+----------+--------+
|Pineappple| 22|
| Apple| 12|
| Pen| 13|
| Pen| 9|
+----------+--------+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | David דודו Markovitz |
