'Error saving a parquet file with partitionBy
I'm trying to save a parquet file with the append mode and running into an issue trying to do this from a windows system and then a linux system. Consider the following code.
val df = Seq(
(1, "test name1"),
(2, "test name2"),
(3, "test name3")).toDF("id", "name")
df.write.mode("append").partitionBy("name").parquet("D:\\path\\data.parquet")
when I run this code on a Windows system, I get the parquet file with three partitions as expected.

Further, when I run this on a Linux system, it still works fine except the space character is not encoded with %20.
Now, if I first create the parquet file (data.parquet) from windows and then try to append to the same file from linux, it creates three new partitions and also outputs an error saying
java.io.FileNotFoundException: /path/data.parquet/_SUCCESS (Permission denied)
If I manually encode the space character before I append to the file from linux, I get an %2520 where it encoded the % character.
df.withColumn("newName", regexp_replace(col("name")," ", "%20"))
.drop("name")
.withColumnRenamed("newName", "name")
.write.mode("append").partitionBy("name").parquet("/path/data.parquet")
Any idea how to handle this and make it work so that both windows and linux can append to the same file? What I'm trying to do is to encode the space character to %20 in the partition names when I save the file from linux.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|



