'Is there an option to directly delete rows in ORC file in pyspark or databricks
Is there any option to directly delete the rows from ORC files, provided its structure.
I am using Azure Databricks,
With below query i am reading the content of the ORC file, and wanted to delete those
%sql select * from orc.`/mnt/my-adls-storage/data/app/simple.orc`
where field='test'
Is there a way to directly remove the rows from orc.
Alternatively,
-
- I can read the orc as dataframe
-
- Filter the records to be deleted
-
- Write back to a new orc file.
-
- Remove the older one
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
