'spark submit operator in airflow
1.my py script is in s3 bucket. 2.airflow installed in one of aws ec2 instances 3.want to submit py script in emr cluster which uses another ec2 instances!!
airflow dag code.
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator
submit_spark_CV_***** = SparkSubmitOperator(
application ='s3://****-code-backup/CV_*******_Test.py' ,
conn_id= 'spark_default',
task_id='submit_spark_CV_*****',
dag=dag
)
dummy_operator = DummyOperator(task_id='dummy_task', retries=3, dag=dag)
dummy_operator>>submit_spark_CV_*****
what connections do we required ?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
