'Access MongoDB from AWS Glue scripts

My job basically reads some data from S3 and do updates/insert into the respective MongoDB collection. Our MongoDB server is hosted on an EC2 instance, same region & VPC as of AWS Glue.

# Read data from S3
obj = self.s3_client.get_object(Bucket=bucket_name, Key=file_name)
df = pd.read_csv(obj['Body'], compression='gzip', sep=',')

# Some validations/transformation

# Mongo Client
from pymongo import MongoClient
client = MongoClient('mongodb://uname:[email protected]:27017/db_name')
db = client[DB_NAME]

# Insert records
db[collection_name].insert_many(new_records_dict)

# Update records
db[collection_name].update_one({'ID': id}, {'$set': {'TITLE': title_to_use}})

When I try to execute the above script from AWS Glue, it throws below error

ServerSelectionTimeoutError: 55.xx.xxx.xx:27017: timed out, Timeout: 30s, Topology Description: <TopologyDescription id: 622f13ebddb0b336xxxxx, topology_type: Unknown, servers: [<ServerDescription ('55.xx.xxx.xx', 27017) server_type: Unknown, rtt: None, error=NetworkTimeout('55.xx.xxx.xx:27017: timed out')>]>

Same script work from my local machine (using VPN). Is there any way I can allow connections from AWS Glue to the MongoDB server (allowing specific SecurityGroup)?

The other way I tried is by creating Connections from Glue Console but that is resulting in the below issue.

Check that your connection definition references your Mongo database with correct URL syntax, username, and password.

And for the above issue, there seems to be no solution from the relevant post I saw on Stackoverflow.

Relevant question links:

In the official document also there is no mention of MongoDB connection properties. Does MongoDB support is removed from AWS Glue? https://docs.aws.amazon.com/glue/latest/dg/connection-defining.html



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source