'spark submit operator in airflow

1.my py script is in s3 bucket. 2.airflow installed in one of aws ec2 instances 3.want to submit py script in emr cluster which uses another ec2 instances!!

airflow dag code.

from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator

submit_spark_CV_***** = SparkSubmitOperator(
        application ='s3://****-code-backup/CV_*******_Test.py' ,
        conn_id= 'spark_default',
        task_id='submit_spark_CV_*****',
        dag=dag
        )
dummy_operator = DummyOperator(task_id='dummy_task', retries=3, dag=dag)
dummy_operator>>submit_spark_CV_*****

what connections do we required ?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source