'Saving a pyspark dataframe to mongodb gives an error

I try to save a pyspark dataframe to mongodb using a google cloud dataproc cluster, but it keeps showing me an error message. I'm using spark 2.4.7 and python 3.7, and mongoDB spark connector 2.4.3 Here is my code:

spark = SparkSession.builder\
                    .master("yarn")\
                    .appName("demo")\
                    .config("spark.mongodb.input.uri",
                             "mongodb+srv://my_host:27017/people_db") \
                    .config("spark.mongodb.output.uri",
                            "mongodb+srv://my_host:27017/people_db") \
                    .config('spark.jars.packages',
                            'org.mongodb.spark:mongo-spark-connector_2.12-2.4.3')\
                    .getOrCreate()
df = spark.read\
          .format('csv')\
          .options(header=True)\
          .load(csv_path)

# ----------Some data processing -----------

df.write\    #This is the block of code that shows the error
  .format("com.mongodb.spark.sql.DefaultSource")\
  .mode("append")\
  .option("collection", "people")\
  .save()

Here is the error message:

enter image description here



Solution 1:[1]

The mongo driver jar is not included in the class path. The two mongo jars (connector and driver) are essential in spark/jars path. I was able to run on local and also as dataproc job by referring to the below link. Mongo connector : 2.12_3.0.1 Mongo java driver : 3.12 Spark : 3.0.2

Mongo dependencies required

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1