'Google DataFlow Updating an existing pipeline

I am trying to update a running job on data flow.

Following this guide: https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline enter image description here

I have executed a pipeline using options created from the pom file using pipeline.run(): enter image description here

and was able to run a new job on the data flow from my custom template using

gcloud dataflow jobs run myJobName *arguments*

When I am trying to update the job I am adding the next two arguments as mentioned in the guide:<argument>--update</argument> <argument>--jobName=${jobName}</argument>

I am executing the pipeline (using pipeline.run()) and then I want to update the old job with the new template.

I Can see that my new template is there and I can create new Jobs from it using the command:

gcloud dataflow jobs run myJobName *arguments*

But all I get is a new job and my old job is not updated.

Did I miss anything? When in the guide they refer to "launch a new job" are they talking about executing the pipeline (using pipeline.run()) or running a job from the new template?



Solution 1:[1]

I've been reading through these same doc, while setting up a CICD process to deploy updates to my Dataflow streaming jobs.

I believe this is expected behavior, since gcloud dataflow run ... will:

  • drain the existing job with the job_name
  • Create a new job with the same job_name (but new job_id)
  • Start up the new job_id

As part of drain, all messages currently being processed will finish, preventing duplicates and misses (unintened ACKs).

But all I get is a new job and my old job is not updated.

Did I miss anything?

For reference, the commands I run as part of my deploy process are:

  1. To build upload the template:
        echo "Deploying ApacheBeam template to GCS..."
        python3 ${_SCRIPT_LOCATION} \
        --project=${_PROJECT_ID} \
        --template_location=${_TEMPLATE_LOCATION} \
        --temp_location=${_TEMPORARY_LOCATION} \
        --region=us-central1 \
        --runner=DataflowRunner \
        --staging_location=${_STORAGE_LOCATION} \
        --streaming \
        --update \
        --job_name=${_JOB_NAME}
  1. To recreate/start Job_Name using template:
        echo -e "Updating ${_JOB_NAME} to point at new ApacheBeam template.."
        gcloud dataflow jobs run ${_JOB_NAME} --gcs-location ${_TEMPLATE_LOCATION}

Please lmk if this helps or ask follow up in comments below

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mts