'Connect with postgresql using the V2 Spark SQL jdbc datasource
I am trying to create a connection with postgresql using the Spark SQL jdbc datasource. I am using spark 3.2.1
I have started the spark shell including the postgres driver and I can connect and execute queries without problems. I am using this statement:
val df = spark.read.format("jdbc").option("url", "jdbc:postgresql://host:port/").option("driver", "org.postgresql.Driver").option("dbtable", "test").option("user", "postgres").option("password", "*******").option("pushDownAggregate",true).load()
I am adding the pushDownAggregate option because I would like the aggregations are delegated to the source. But for some reason this is not happening.
If I run the df.queryExecution statement in my case it returns the following result:
res7: org.apache.spark.sql.execution.QueryExecution =
== Parsed Logical Plan ==
Relation [typesmallint#183,typeinteger#184,typebigint#185L,typenumeric#186,typereal#187,typedoubleprecision#188,typechar#189,typevarchar#190,typetext#191,typebytea#192,typetimestamp#193,typetimestamptz#194,typedate#195,typetime#196,typetimetz#197,typeinterval#198,typeboolean#199,typecidr#200,typeinet#201,typemacaddr#202,typemacaddr8#203,typebit#204,typepoint#205,typeline#206,... 7 more fields] JDBCRelation(test) [numPartitions=1]
== Analyzed Logical Plan ==
typesmallint: smallint, typeinteger: int, typebigint: bigint, typenumeric: decimal(38,0), typereal: float, typedoubleprecision: double, typechar: string, typevarchar: string, typetext: string, typebytea: binary, typetimestam...
Looking at this I get the feeling that for some reason the V2 JDBC data source is not being used. If this data source was in use, this query would return RelationV2 and not Relation.
What configuration would be necessary to use the V2 data source?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
