'ServerSelectionTimeoutError when inserting documents in DocumentDB

I have an S3 buckets with either

  • json of perfumes of one brands or
  • folders of one brand with perfumes in json format.

I know how to get their index but I would like to insert these objects into my documentdb database, in collections respective to their brand.

import boto3
import pymongo
import sys


def iterate_bucket_items(bucket):
    """
    Generator that iterates over all objects in a given s3 bucket

    See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2 
    for return data format
    :param bucket: name of s3 bucket
    :return: dict of metadata for an object
    """


    client = boto3.client('s3')
    paginator = client.get_paginator('list_objects_v2')
    page_iterator = paginator.paginate(Bucket=bucket)

    for page in page_iterator:
        if page['KeyCount'] > 0:
            for item in page['Contents']:
                yield item


##Create a MongoDB client, open a connection to Amazon DocumentDB as a replica set and specify the read preference as secondary preferred
client = pymongo.MongoClient('mongodb://user:[email protected]:27017/?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false') 

##Specify the database to be used
db = client.perfumes
c = 0
for i in iterate_bucket_items(bucket='datahubpredicity'):
    keyName = i['Key']
    print(keyName)
    if '/' in keyName and keyName[-1] is not '/':
        print("keyName: ", keyName)
        folder, file = keyName.split('/')
        ##Specify the collection to be used
        col = db[folder]
        content_object = s3.Object('datahubpredicity', keyName)
        file_content = content_object.get()['Body'].read().decode('utf-8')
        json_content = json.loads(file_content)
        print(json_content)
        ##Insert a single document
        col.insert_one(json_content)
    c+=1
    if c >= 6:
        break
    
    # ##Print the result to the screen
    # print(x)
    
##Close the connection
client.close()

But it returns:

pymongo.errors.ServerSelectionTimeoutError: 
datahub.cluster-1.eu-west-3.docdb.amazonaws.com:27017: 
timed out, Timeout: 30s, 
Topology Description: <TopologyDescription id: 6254472217824b192df5665d, 
                      topology_type: ReplicaSetNoPrimary, 
                      servers: [<ServerDescription ('datahub.cluster-1.eu-west-3.docdb.amazonaws.com', 27017) server_type: Unknown, 
                      rtt: None, error=NetworkTimeout('datahub.cluster-1.eu-west-3.docdb.amazonaws.com:27017: timed out')>]>


Solution 1:[1]

It appears the script can't connect to DocumentDB, your client connection string looks a bit wrong, not sure where you got the ssl_ca_certs parameter. You should have something like this:

client = pymongo.MongoClient('mongodb://<sample-user>:<password>@datahub.cluster-1.eu-west-3.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false') 

And this is assuming you have the .pem file in the same folder with the python script and the security group for DocumentDB configured correctly.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mihai A