'Values of the columns are null and swapped in pyspark dataframe

I am using pyspark==2.3.1. I have performed data preprocessing on the data using pandas now I want to convert my preprocessing function into pyspark from pandas. But while reading the data CSV file using pyspark lot of values become null of the column that has actually some values. If I try to perform any operation on this dataframe then it is swapping the values of the columns with other columns. I also tried different versions of pyspark. Please let me know what I am doing wrong. Thanks

Result from pyspark:

enter image description here

The values of the column "property_type" have null but the actual dataframe has some value instead of null.

CSV File: enter image description here

But pyspark is working fine with small datasets. i.e. enter image description here



Solution 1:[1]

Are you limited to use CSV fileformat? Try parquet. Just save your DataFrame in pandas with .to_parquet() instead of .to_csv(). Spark works with this format really well.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Huvi