'CI/CD pipeline on AWS cloud for pyspark EMR application
I need to create CI/CD pipeline in AWS cloud for a pyspark application , finally this py-spark is to be invoked through a airflow DAG.
Solution 1:[1]
I am no expert on this either, but you can follow this guide:
The idea is to automate job testing in Spark local mode, then run a live job with infrastructure created on the fly and finally deploy the job to production if all the previous steps succeed. I would keep my production jobs automated in Airflow and run this CI/CD pipeline on development branches (these ones without deploying to production, of course) as well as on PR on the main branch. That way your production jobs will always be functioning correctly and only incorporate new functionality/changes after they are fully tested on development branches.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ramon Soto Garcia |
