'Are Spark 2 output paritions written as files sequentially ordered?

I am writing an application in Spark (using scala). At the end, I have a huge dataframe (size>1million) with data that is sorted. I have written the output directly as a csv file, and have the lines split as per partitions, that together constitute the original data. I was unable to write data for large datasets from the driver itself, so I have written csvs of each partition, named as part-00000-..., etc. When I generate the output at bash shell through as cat part* > output.csv, I see that the output holds the order.

What I would like to know is that if this is an assured behavior. If not, I will have to carry an expensive sort operation.

apache-spark output

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Are Spark 2 output paritions written as files sequentially ordered?

Sources

Related Questions