'Migration of Pyspark cronjobs schedule on kubernetes to apache airflow deployed on kubernetes using helm chart

I have scheduled multiple PySpark cronjobs based on kubernetes cluster.

I am using Pyspark image in kubernetes for spark-submit operation and created seperate kubernetes configuration file.

Also I am providing jar files in spark submit --jars.

Now i want to schedule this jobs on apache airflow.

I have deployed apache airflow on same kubernetes cluster using helm chart.

Now I am facing problem for scheduling this jobs on apache airflow. I am not understanding how to schedule it on apache airflow which is deployed on kubernetes.

Should I make changes in pyspark script using airflow python library?

Or Can I schedule that using UI of Apache Airflow.

I am naive in apache airflow and not getting proper path for its kubernetes deployment.

Please guide me on this. I want to utilise kubernetes cluster power for this pyspark jobs.

Your help is appreciated. Thank You.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source