'Update specific row on Oracle/Sql Db using jdbc via Databricks

I have an use case where I want to update specific row, by any identifier/where clause conditions and update that record on Oracle or SQL Server from databricks.

As i use spark.read.format("jdbc") against any of the databases, I could not easily find a way to update specific rows back to these DBs.

If i use,

df.write.format("jdbc")
.option("url", dbServerJdbcUrl)
.option("user", username)
.option("secret", password)
.option("driver", <either com.microsoft.sqlserver.jdbc.SQLServerDriver or oracle.jdbc.driver.OracleDriver>)
.option("dbTable",<table on the database platform>)
.mode('overwrite') //or other options
.save()

it only overwrites the whole "dbTable" on the database. I could not find a way to have it work by using .option("query", "update statements") so far.

If i tend to write to another temp or parking table, then it becomes 2 stages of work, wherein, i have to go back to the Db platform and have the actual respective tables updated from the parking table.

Another note - when i do the above write, on a table which has millions of rows, and i only want to update handful of them, any of the modes are only just causing more trouble.

  • overwrite - simply makes the millions of rows to lose/overwritten by this handful of data from df.
  • append - either creates dupes or eventually failure due to constraints

Is there any better solution to have the databricks update the specific rows on a database?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source