'Restricting triggering of GCP Cloud Function to certain IP-range

I use GCP cloud functions to trigger actions based on http-requests from another system. I know the IP-range where the requests come from and I want to restrict triggering the actions only on requests from this IP-range.

How should I go about this? I have tried to add the IP-range as a subnet to my default VPC-network and restrict the cloud function to Allow internal traffic only, but the requests do not come through due to 403 error.

Thank you in advance.

Solution 1:^[1]

I personally would NEVER recommend using .toArray() to query an indefinite/very large no. of documents, as you're asking nodejs to fetch 3milllion documents from your db and store the whole thing in one humongosaurus array in memory. Hope you have a high-spec computer. Also, it won't write anything to a file unless the whole thing is in memory, if that ever happens.

Instead, I would be iterating the cursor so that at any given point I only have 1 doc in memory and I pipe that to the writestream.

Since you are reading 3 million documents, it is going to take time, it's just a matter of how much. At least with this approach, you're going to be writing the documents in the csv file as they're being read, instead of writing the whole thing at once from memory.

I wrote a small script to iterate over the whole thing via cursor.next(), document-by-document, and tested it.

const { MongoClient } = require('mongodb');
const csv = require('fast-csv');
const fs = require('fs');
// or as an es module:
// import { MongoClient } from 'mongodb'

// Connection URL
const url = 'supersecretmongourlhere';
const client = new MongoClient(url);

// Database Name
const dbName = 'db_name_here';


const run = async() => {
    try{
        await client.connect();
        console.log("db connected");
        const db = client.db(dbName);
        console.time("X")

        const pipelineStages = [{
            $match: {
                // your pipeline here
            }
        },{
            $sort: {
                _id: -1
            }
        },{
            $limit: 50000
        }]

        const cursor = db.collection('collection_name_here').aggregate(pipelineStages)

        const csvStream = csv.format({ headers: true });

        const writeStream = fs.createWriteStream('./myfile.csv');

        csvStream.pipe(writeStream).on('end',() => {
            console.log("DONE");
        }).on('error',err => console.error(err));

        while(await cursor.hasNext()){
            const doc = await cursor.next();
            csvStream.write(doc);
            console.log(`${doc._id} written`);
        }
        console.log('done')
        console.timeEnd("X")
        csvStream.end();
        writeStream.end();

    } catch(e) {
        console.error(e);
    }
    
}

run();

For 50000 documents, 1:17.987s = 77987ms, 50000/77987 = 1.56ms/document. 1.56*3000000 = 4680s = ~78mins

Is that acceptable in your use case?

Yes, I know this isn't a very fast solution but it will work.

There might be some way to speed it up further by fetching docs in batches from the db, will look into it if I can.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Restricting triggering of GCP Cloud Function to certain IP-range

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]