'Databricks Python handling with delimiter
I am using Python in order to make a dataframe based on a CSV file.
The input CSV file looks like this:
After running the following code:
dataframe_sales = spark.read.format('csv').options(header='true',inferSchema='true').load('/mnt/sadwhpostgre001/excel/goud/sales_file.csv')
I see that the some characters on of the column CompanyName moved to the right.
See the output file:
How can I make a dataframe that just keeps the structure of the values on the record? So after making a dataframe the structure must be the same as the input CSV file.
Solution 1:[1]
This is because you have a , (comma) in the name.
Suggestion: Change the default delimiter to ; or | or something else when you save the file as a CSV.
Then read it from Databricks with the delimiter option enabled:
.option("delimiter","your_delimiter_here")
Please update your code and change the default delimiter by adding the option:
dataframe_sales = spark.read.format('csv').options(header='true',inferSchema='true').option("delimiter",",").("quote","\"").load('/mnt/sadwhpostgre001/excel/goud/sales_file.csv')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |


