'Ansible hangs after executing on the first node of the group

I am trying to install RKE2 on Ubuntu machines. The installation process starts by adding first node as a master node and then rest of the nodes will be added to the first master node.

I have created two plays with one role for adding first master node and as well as for adding rest of the master nodes. The first play is successful all the times but the issue starts with the second play.

This play will be executed properly sometimes and hang frequently. I thought the play is failing while gathering facts but it fails even after disabling the gather facs.

I have tried updating multiple options from "/etc/ansible/ansible.cfg" file like the control_path, control_path_dir, controlpersist and serveraliveinterval etc but none of them gave me consistent results.

I am able to SSH to all the nodes and the ssh connection also establishes very fast. I don't see any much information from the -vvvv output, so please suggest me any steps to debug the cause of this issue.

NOTE: The ansible hangs frequently but the entire plays will be executed sometimes without being stuck.

ansible version:

ansible 2.9.27
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/ubuntu/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ubuntu/.local/lib/python3.8/site-packages/ansible
  executable location = /home/ubuntu/.local/bin/ansible
  python version = 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]

OS details of the nodes:

NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Failure Snippet:

PLAY [kubemasters[1:3]] *************************************************************************************************************************************
META: ran handlers

TASK [launch_server_node : copy the rke2 artifacts to node's home path] *************************************************************************************
task path: /rancher/rke2_offline_installer/roles/launch_server_node/tasks/launch_server_node.yaml:3
<master2> ESTABLISH SSH CONNECTION FOR USER: ubuntu
<master2> SSH: EXEC ssh -vvv -C -o ServerAliveInterval=30 -o ControlMaster=auto -o ControlPersist=60s -o 'IdentityFile="./<file>.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ControlPath=/home/ubuntu/.ansible/cp/%h-%r master2 '/bin/sh -c '"'"'echo ~ubuntu && sleep 0'"'"''
<master3> ESTABLISH SSH CONNECTION FOR USER: ubuntu
<master3> SSH: EXEC ssh -vvv -C -o ServerAliveInterval=30 -o ControlMaster=auto -o ControlPersist=60s -o 'IdentityFile="./<file>.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ControlPath=/home/ubuntu/.ansible/cp/%h-%r master3 '/bin/sh -c '"'"'echo ~ubuntu && sleep 0'"'"''


^C [ERROR]: User interrupted execution

real    17m45.204s
user    1m43.477s
sys     0m22.691s

inventory.ini

[kubemasters]
master1
master2
master3


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source