'Running thousands of scheduled jobs in AWS on a regular cadence?
I'm architecting an application solution in AWS and am looking into options AWS has for running one-off jobs to run on a regular schedule.
For example, we have a task that needs to run every 5 minutes that does an API call to an external API, interprets the data and then possibly stores some new information in a database. This particular task is expected to run for 30 seconds or so and will need to be run every 5 minutes. Where this gets a little more complex is we're running a multi-tenant application and this task needs to be performed for each tenant individually. It doesn't satisfy the user's requirements to have a single process do the specified task for each tenant in sequence. The task must be performed every x minutes (sometimes as low as every minute) and it must complete for each tenant as quickly as it takes to perform the task exactly 1 time. In other words, all 200, let's say, tenants must have a task run for them at midnight that each have their task complete in the time it takes to query the API and update the database for one tenant.
To add to the complexity a bit, this is not the only task we will be running on a regular schedule for our tenants. In the end we could have dozens of unique tasks, each running for hundreds of tenants, resulting in thousands or tens of thousands of unique concurrent tasks.
I've looked into ECS Scheduled Tasks which uses CloudWatch Events (which is now the EventBridge) but the EventBridge has a limit of 300 rules per event bus. I think that means we're going to be out of luck if we need to have 10,000 rules (one for each task * the number of tenants), but I'm honestly not sure whether each account gets its own event bus or if that's divided up differently.
In any case, even if this did work, it's still not a very attractive option to me to have 10,000 different rules set up in the EventBridge. At least, it feels like it might be difficult to manage. To that end I'm now more so looking into just creating a single EventBridge rule per event type that will kick off a parent task, that in turn asynchronously kicks off as many asynchronous instances of a child task that is needed, one per tenant. This would limit our EventBridge rules to somewhere around a few dozen. Each one of these, when triggered, would asynchronously spawn a task for each tenant that can all run together. I'm not 100% sure on what type of object this will spawn, it wouldn't be a Lambda since that would easily cause us to hit the 1,000 concurrent Lambda function limit but it might be something like a Fargate ECS task that executes for a few seconds then goes away when it's completed.
I'd love to hear others thoughts on these options, my current direction and any other options I'm currently missing.
Solution 1:[1]
You don't necessarily need to look at ECS for this, because 1,000 invocations of a Lambda at a time is only the default concurrency limit. That is something you can request an increase for in the Service Quotas console:
There is no maximum concurrency limit for Lambda functions. However, limit increases are granted only if the increase is required for your use case.
Source: AWS Support article.
Same goes for the 300 rules per event bus limit. That is also a default limit and can be increased upon request in the Service Quotas console.
Since you mentioned branching logic, I wonder if you've looked into AWS Step Functions? In particular, Express Workflows within Step Functions may suit the duration and rates of your executions.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Yann Stoneman |
