'Overwriting a file in Azure datalake Gen 2 from Synapse Notebook throws Exception
As part of migrating from Azure Databricks to Azure Synapse Analytics Notebooks, I'm facing the issue explained below.
While reading a CSV file from Azure Datalake Storage Gen 2 and assigning it to a pyspark dataframe using the following command.
df = spark.read.format('csv').option("delimiter", ",").option("multiline", "true").option("quote", '"').option("header", "true").option("escape", "\\").load(csvFilePath)
After processing this file, we need to overwrite it and we use the following command.
df.coalesce(1).write.option("delimiter", ",").csv(csvFilePath, mode = 'overwrite', header = 'true')
What this does is, it deletes the existing file at the path "csvFilePath" and the fails with error, "Py4JJavaError: An error occurred while calling o617.csv."
Things I've noticed:
- Once the CSV file at path "csvFilePath" is deleted by the overwrite command, data from dataframe "df" also gets removed.
- Looks like it is referring the file at runtime whereas traditionally in databricks we did not have this issue and overwrite ran successfully.
[Error returned by Synapse Notebook at write command.][1] [1]: https://i.stack.imgur.com/Obj9q.png
Solution 1:[1]
It's suggestable to perform mounting the data storage. Kindly refer the below documentation.
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | SairamTadepalli-MT |
