'Databricks Python handling with delimiter

I am using Python in order to make a dataframe based on a CSV file.

The input CSV file looks like this:

enter image description here

After running the following code:

dataframe_sales = spark.read.format('csv').options(header='true',inferSchema='true').load('/mnt/sadwhpostgre001/excel/goud/sales_file.csv')

I see that the some characters on of the column CompanyName moved to the right.

See the output file:

enter image description here

How can I make a dataframe that just keeps the structure of the values on the record? So after making a dataframe the structure must be the same as the input CSV file.



Solution 1:[1]

This is because you have a , (comma) in the name.

Suggestion: Change the default delimiter to ; or | or something else when you save the file as a CSV.

Then read it from Databricks with the delimiter option enabled:

.option("delimiter","your_delimiter_here")

Please update your code and change the default delimiter by adding the option:

    dataframe_sales = spark.read.format('csv').options(header='true',inferSchema='true').option("delimiter",",").("quote","\"").load('/mnt/sadwhpostgre001/excel/goud/sales_file.csv')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1