'Is there a way to execute distributed code in Python on AWS?
I created a scraper in Python that is navigating a website. It pulls many links and then It has to visit every link pull the data and parse and store the result.
Is there an easy way to run that script distributed in the cloud (like AWS)?
Ideally, I would like something like this (probably is more difficult, but just to give an idea)
run_in_the_cloud --number-of-instances 5 scraper.py
after the process is done, the instances are killed, so it does not cost more money.
I remember I was doing something similar with hadoop and java with mapreduce long time ago.
Solution 1:[1]
If you can put your scraper in a docker image it's relatively trivial to run and scale dockerized applications using AWS ECS Fargate. Just create a task definition and point it at your container registry, then submit runTask requests for however many instances you want. AWS Batch is another tool you could use to trivially parallelize container instances too.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | zyd |
