'ServerSelectionTimeoutError when inserting documents in DocumentDB
I have an S3 buckets with either
- json of perfumes of one brands or
- folders of one brand with perfumes in json format.
I know how to get their index but I would like to insert these objects into my documentdb database, in collections respective to their brand.
import boto3
import pymongo
import sys
def iterate_bucket_items(bucket):
"""
Generator that iterates over all objects in a given s3 bucket
See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2
for return data format
:param bucket: name of s3 bucket
:return: dict of metadata for an object
"""
client = boto3.client('s3')
paginator = client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=bucket)
for page in page_iterator:
if page['KeyCount'] > 0:
for item in page['Contents']:
yield item
##Create a MongoDB client, open a connection to Amazon DocumentDB as a replica set and specify the read preference as secondary preferred
client = pymongo.MongoClient('mongodb://user:[email protected]:27017/?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false')
##Specify the database to be used
db = client.perfumes
c = 0
for i in iterate_bucket_items(bucket='datahubpredicity'):
keyName = i['Key']
print(keyName)
if '/' in keyName and keyName[-1] is not '/':
print("keyName: ", keyName)
folder, file = keyName.split('/')
##Specify the collection to be used
col = db[folder]
content_object = s3.Object('datahubpredicity', keyName)
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)
print(json_content)
##Insert a single document
col.insert_one(json_content)
c+=1
if c >= 6:
break
# ##Print the result to the screen
# print(x)
##Close the connection
client.close()
But it returns:
pymongo.errors.ServerSelectionTimeoutError:
datahub.cluster-1.eu-west-3.docdb.amazonaws.com:27017:
timed out, Timeout: 30s,
Topology Description: <TopologyDescription id: 6254472217824b192df5665d,
topology_type: ReplicaSetNoPrimary,
servers: [<ServerDescription ('datahub.cluster-1.eu-west-3.docdb.amazonaws.com', 27017) server_type: Unknown,
rtt: None, error=NetworkTimeout('datahub.cluster-1.eu-west-3.docdb.amazonaws.com:27017: timed out')>]>
Solution 1:[1]
It appears the script can't connect to DocumentDB, your client connection string looks a bit wrong, not sure where you got the ssl_ca_certs parameter. You should have something like this:
client = pymongo.MongoClient('mongodb://<sample-user>:<password>@datahub.cluster-1.eu-west-3.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false')
And this is assuming you have the .pem file in the same folder with the python script and the security group for DocumentDB configured correctly.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mihai A |
