'Is it possible to loop through submission of PySpark job submission and parameterize each submit?

I have a DAG (partial code snippet) which will submit multiple jobs to Dataproc on Google Cloud

        for collection in collection_list:
            op = dataproc.DataprocSubmitJobOperator(
                task_id='load_collection_{}'.format(collection),
                job=PYSPARK_JOB_READ_FROM_MONGO
                )

and I have

PYSPARK_JOB_READ_FROM_MONGO = {
    "reference": {"project_id": project_id},
    "placement": {"cluster_name": 'spark-ingest-mongodb-{{ ds_nodash }}'},
    "pyspark_job": {
        "main_python_file_uri": "gs://greenline-demo-341617-spark-src/Demo/01_MongoDB/ingest.py",
        "args": [
        "gcsPath=gs://testing_part/path/",
        "test.id=404",
        "correlation.id=AUDIT_2222",
        "b.url=http://testing-v1.net/test/"
         ]
    }
}

Let's say I need to change parameter "gcsPath" for each submit, how do I pass this?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source