'Google DataFlow Updating an existing pipeline
I am trying to update a running job on data flow.
Following this guide: https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline

I have executed a pipeline using options created from the pom file using pipeline.run():

and was able to run a new job on the data flow from my custom template using
gcloud dataflow jobs run myJobName *arguments*
When I am trying to update the job I am adding the next two arguments as mentioned in the guide:<argument>--update</argument> <argument>--jobName=${jobName}</argument>
I am executing the pipeline (using pipeline.run()) and then I want to update the old job with the new template.
I Can see that my new template is there and I can create new Jobs from it using the command:
gcloud dataflow jobs run myJobName *arguments*
But all I get is a new job and my old job is not updated.
Did I miss anything? When in the guide they refer to "launch a new job" are they talking about executing the pipeline (using pipeline.run()) or running a job from the new template?
Solution 1:[1]
I've been reading through these same doc, while setting up a CICD process to deploy updates to my Dataflow streaming jobs.
I believe this is expected behavior, since gcloud dataflow run ... will:
drainthe existing job with thejob_name- Create a new job with the same
job_name(but newjob_id) - Start up the new
job_id
As part of drain, all messages currently being processed will finish, preventing duplicates and misses (unintened ACKs).
But all I get is a new job and my old job is not updated.
Did I miss anything?
For reference, the commands I run as part of my deploy process are:
- To build upload the template:
echo "Deploying ApacheBeam template to GCS..."
python3 ${_SCRIPT_LOCATION} \
--project=${_PROJECT_ID} \
--template_location=${_TEMPLATE_LOCATION} \
--temp_location=${_TEMPORARY_LOCATION} \
--region=us-central1 \
--runner=DataflowRunner \
--staging_location=${_STORAGE_LOCATION} \
--streaming \
--update \
--job_name=${_JOB_NAME}
- To recreate/start
Job_Nameusing template:
echo -e "Updating ${_JOB_NAME} to point at new ApacheBeam template.."
gcloud dataflow jobs run ${_JOB_NAME} --gcs-location ${_TEMPLATE_LOCATION}
Please lmk if this helps or ask follow up in comments below
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mts |
