'Airflow SparkKubernetesOperator logging
I am using KubernetesExecutor as a Executor in Airflow. My DAG code
from datetime import datetime, timedelta
from airflow import DAG
from airflow.providers.cncf.kubernetes.operators.spark_kubernetes import SparkKubernetesOperator
from airflow.providers.cncf.kubernetes.sensors.spark_kubernetes import SparkKubernetesSensor
dag = DAG(
'spark_pi_using_spark_operator',
default_args={'max_active_runs': 1},
description='submit spark-pi as sparkApplication on kubernetes',
schedule_interval=timedelta(days=1),
start_date=datetime(2021, 1, 1),
catchup=False,
)
t1 = SparkKubernetesOperator(
task_id='spark_pi_submit',
namespace="default",
application_file="example_spark_kubernetes_spark_pi.yaml",
do_xcom_push=True,
dag=dag,
)
t2 = SparkKubernetesSensor(
task_id='spark_pi_monitor',
namespace="default",
application_name="{{ task_instance.xcom_pull(task_ids='spark_pi_submit')['metadata']['name'] }}",
dag=dag,
)
t1 >> t2
DAG executes successfully.I am able to see output in spark-driver logs by executing kubectl logs spark-pi-driver

Solution 1:[1]
update the SparkKubernetesSensor configuration as below
t2 = SparkKubernetesSensor( task_id='spark_pi_monitor',
namespace="default",
application_name="{task_instance.xcom_pull(task_ids='spark_pi_submit') ['metadata']['name'] }}",
dag=dag,
attach_log=True, )
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

