'Spark MongoDB Connector unable to df.join - Unspecialised MongoConfig
Using the latest MongoDB connector for Spark (v10) and trying to join two dataframes yields the following unhelpful error.
Py4JJavaError: An error occurred while calling o64.showString.
: java.lang.UnsupportedOperationException: Unspecialised MongoConfig. Use `mongoConfig.toReadConfig()` or `mongoConfig.toWriteConfig()` to specialize
at com.mongodb.spark.sql.connector.config.MongoConfig.getDatabaseName(MongoConfig.java:201)
at com.mongodb.spark.sql.connector.config.MongoConfig.getNamespace(MongoConfig.java:196)
at com.mongodb.spark.sql.connector.MongoTable.name(MongoTable.java:99)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation.name(DataSourceV2Relation.scala:66)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.$anonfun$applyOrElse$2(V2ScanRelationPushDown.scala:65)
Pyspark Code is simply pulling in two tables and running a join:
dfa = spark.read.format("mongodb").option("uri", mongodb://127.0.0.1/people.contacts").load()
dfb = spark.read.format("mongodb").option("uri", mongodb://127.0.0.1/people.accounts").load()
dfa.join(dfb, 'PKey').count()
SQL gives the same error:
dfa.createOrReplaceTempView("usr")
dfb.createOrReplaceTempView("ast")
spark.sql("SELECT count(*) FROM ast JOIN usr on usr._id = ast._id").show()
Document structures are flat.
Solution 1:[1]
Have you try using the latest version (10.0.2) of mongo-spark-connector? Can find it at here
I had a similar problem, solved it by replace 10.0.1 by 10.0.2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | FULLHOUSE |
