'Are Spark 2 output paritions written as files sequentially ordered?
I am writing an application in Spark (using scala). At the end, I have a huge dataframe (size>1million) with data that is sorted. I have written the output directly as a csv file, and have the lines split as per partitions, that together constitute the original data. I was unable to write data for large datasets from the driver itself, so I have written csvs of each partition, named as part-00000-..., etc. When I generate the output at bash shell through as cat part* > output.csv, I see that the output holds the order.
What I would like to know is that if this is an assured behavior. If not, I will have to carry an expensive sort operation.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
