'Pyspark's write stringtype argument doesn't deal with null values
I am trying to write a dataset's data into postgres db using jdbc driver.
my_df.write().format('jdbc').mode('append')\
.option('driver', 'org.postgresql.Driver')\
.option('url', 'my_url')\
.option('dbtable', 'my_dbtable')\
.option('user', 'my_user').save()
Apparently pyspark tries to insert all textual types (i.e. uuid) as text by default and throws that error:
Caused by: org.postgresql.util.PSQLException: ERROR: column "id" is of type uuid but expression is of type character varying
Hint: You will need to rewrite or cast the expression.
In order to overcome that issue I had to set a property:
'stringtype':"unspecified"
But that solution does not work on NULL values and throws that error
Caused by: org.postgresql.util.PSQLException: ERROR: column "id" is of type uuid but expression is of type character
Which basically means that it tries to insert the NULL value as character. Separating the dataset to 2 datasets (as @Karuhanga suggested here Pyspark nullable uuid type uuid but expression is of type character varying) is not possible in my case. Did anyone face that issue and found a solution that does not fix a specific column?
Solution 1:[1]
Instead of putting a Null value in uuid columns, use as uuid by default like this:
uuid='00000000-0000-0000-0000-000000000000'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
