'Ansible async_status task - error: ansible_job_id "undefined variable"

I have a 3 node ubuntu 20.04 lts - kvm - kubernetes cluster, and the kvm-host is also ubuntu 20.04 lts. I ran the playbooks on the kvm-host. I have the following inventory extract:

nodes:
  hosts:
    sea_r:
      ansible_host: 192.168.122.60
    spring_r:
      ansible_host: 192.168.122.92
    island_r:
      ansible_host: 192.168.122.93
  vars:
    ansible_user: root

and have been trying a lot with async_status, but always fails,

- name: root commands
  hosts: nodes
  tasks:
  - name: bash commands
    ansible.builtin.shell: |
      apt update
    args:
      chdir: /root
      executable: /bin/bash
    async: 2000
    poll: 2
    register: output

  - name: check progress
    ansible.builtin.async_status:
      jid: "{{ output.ansible_job_id }}"
    register: job_result
    until: job_result.finished
    retries: 200
    delay: 5

with error:

fatal: [sea_r]: FAILED! => {"msg": "The task
includes an option with an undefined variable. 
The error was: 'dict object' has no attribute
'ansible_job_id' ...

If instead I try with the following,

- name: root commands
  hosts: nodes
  tasks:
  - name: bash commands
    ansible.builtin.shell: |
      apt update
    args:
      chdir: /root
      executable: /bin/bash
    async: 2000
    poll: 2
    register: output
  - debug: msg="{{ output.stdout_lines }}"
  - debug: msg="{{ output.stderr_lines }}"

I get no errors. Also tried following variation,

  - name: check progress
    ansible.builtin.async_status:
      jid: "{{ item.ansible_job_id }}"
    with_items: "{{ output }}"
    register: job_result
    until: job_result.finished
    retries: 200
    delay: 5

that was suggested as a solution to similar error. That also does not help, I just get slightly different error:

fatal: [sea_r]: FAILED! => {"msg": "The task includes
an option with an undefined variable. The error 
was: 'ansible.utils.unsafe_proxy.AnsibleUnsafeText
object' has no attribute 'ansible_job_id' ...

At the beginning and the end of the playbook, I resume and pause my 3 kvm server nodes like so:

- name: resume vms
  hosts: local_vm_ctl
  tasks:
  - name: resume vm servers
    shell: |
      virsh resume kub3
      virsh resume kub2
      virsh resume kub1
      virsh list --state-paused --state-running
    args:
      chdir: /home/bi
      executable: /bin/bash
    environment:
      LIBVIRT_DEFAULT_URI: qemu:///system
    register: output
  - debug: msg="{{ output.stdout_lines }}"
  - debug: msg="{{ output.stderr_lines }}"

and so

- name: pause vms
  hosts: local_vm_ctl
  tasks:
  - name: suspend vm servers
    shell: |
      virsh suspend kub3
      virsh suspend kub2
      virsh suspend kub1
      virsh list --state-paused --state-running
    args:
      chdir: /home/bi
      executable: /bin/bash
    environment:
      LIBVIRT_DEFAULT_URI: qemu:///system
    register: output
  - debug: msg="{{ output.stdout_lines }}"
  - debug: msg="{{ output.stderr_lines }}"

but I don't see how these plays could have anything to do with said error.

Any help will be much appreciated.



Solution 1:[1]

You get an undefined error for your job id because:

  1. You use poll: X on your initial task, so ansible connects every X seconds to check if the task is finished
  2. When ansible exists that task and enters your next async_status task, the job is done. And since you used a non-zero value to poll the async status cache is automatically cleared.
  3. since the cache was cleared, the job id does not exist anymore.

Your above scenario is meant to be used to avoid timeouts with your target on long running tasks, not to run tasks concurrently and have a later checkpoint on their status. For this second requirement, you need to run the async task with poll: 0 and clean-up the cache by yourself

See the documentation for more explanation on the above concepts:

I made an example with your above task and fixed it to use the dedicated module apt (note that you could add a name option to the module with one or a list of packages and ansible would do both the cache update and install in a single step). Also, retries * delay on the async_status task should be equal or greater than async on the initial task if you want to make sure that you won't miss the end.

- name: Update apt cache
  ansible.builtin.apt:
    update_cache: true
  async: 2000
  poll: 0
  register: output

- name: check progress
  ansible.builtin.async_status:
    jid: "{{ output.ansible_job_id }}"
  register: job_result
  until: job_result.finished
  retries: 400
  delay: 5

- name: clean async job cache 
  ansible.builtin.async_status:
    jid: "{{ output.ansible_job_id }}"
    mode: cleanup

This is more useful to launch a bunch of long lasting tasks in parallel. Here is a useless yet functional example:

- name: launch some loooooong tasks
  shell: "{{ item }}"
  loop:
    - sleep 30
    - sleep 20
    - sleep 35
  async: 100
  poll: 0
  register: long_cmd

- name: wait until all commands are done
  async_status:
    jid: "{{ item.ansible_job_id }}"
  register: async_poll_result
  until: async_poll_result.finished
  retries: 50
  delay: 2
  loop: "{{ long_cmd.results }}"

- name: clean async job cache
  async_status:
    jid: "{{ item.ansible_job_id }}"
    mode: cleanup
  loop: "{{ long_cmd.results }}"

Solution 2:[2]

You have poll: 2 on your task, which tells Ansible to internally poll the async job every 2 seconds and return the final status in the registered variable. In order to use async_status you should set poll: 0 so that the task does not wait for the job to finish.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 flowerysong