'Pyspark's write stringtype argument doesn't deal with null values

I am trying to write a dataset's data into postgres db using jdbc driver.

my_df.write().format('jdbc').mode('append')\
    .option('driver', 'org.postgresql.Driver')\
    .option('url', 'my_url')\
    .option('dbtable', 'my_dbtable')\
    .option('user', 'my_user').save()

Apparently pyspark tries to insert all textual types (i.e. uuid) as text by default and throws that error:

Caused by: org.postgresql.util.PSQLException: ERROR: column "id" is of type uuid  but expression is of type character varying

Hint: You will need to rewrite or cast the expression.

In order to overcome that issue I had to set a property:

'stringtype':"unspecified"

But that solution does not work on NULL values and throws that error

Caused by: org.postgresql.util.PSQLException: ERROR: column "id" is of type uuid  but expression is of type character

Which basically means that it tries to insert the NULL value as character. Separating the dataset to 2 datasets (as @Karuhanga suggested here Pyspark nullable uuid type uuid but expression is of type character varying) is not possible in my case. Did anyone face that issue and found a solution that does not fix a specific column?



Solution 1:[1]

Instead of putting a Null value in uuid columns, use as uuid by default like this:

uuid='00000000-0000-0000-0000-000000000000'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1