'pyspark dataframe not maintaining order after dropping a column
I create a dataframe:
df = spark.createDataFrame(pd.DataFrame({'a':range(12),'c':range(12)})).repartition(8)
its contents are :
df.show()
+---+---+
| a| c|
+---+---+
| 0| 0|
| 1| 1|
| 3| 3|
| 5| 5|
| 6| 6|
| 8| 8|
| 9| 9|
| 10| 10|
| 2| 2|
| 4| 4|
| 7| 7|
| 11| 11|
+---+---+
But, If I drop a column, the remaining column gets permuted
df.drop('c').show()
+---+
| a|
+---+
| 0|
| 2|
| 3|
| 5|
| 6|
| 7|
| 9|
| 11|
| 1|
| 4|
| 8|
| 10|
+---+
Please help me understand what is happening here?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
