'Supervisord Monitoring of grandchild process

We have a service xyz that gets started by a script run-xyz [This script has a bunch of pre-activation steps before starting up the service xyz]

My goal is to monitor the process xyz and for that reason we are using supervisord with the following configuration:

[program:xyz]
command         = /path/to/run-xyz
numprocs        = 1
priority        = 100
startsecs       = 10
startretries    = 10
stopwaitsecs    = 60
exitcodes       = 0
autostart       = false
autorestart     = true
redirect_stderr = true
stdout_logfile  = /path/to/log

The script run-xyz starts up the service xyz and has an infinite loop checking whether the process xyz is present or not. [For example, on a SIGKILL of the process id of xyz, it comes out of the infinite loop and the script stops running and supervisord automatically restarts [program:xyz] after which things get back to normal]

However, if the run-xyz process gets killed somehow, then the xyz service becomes an orphan with parent process id as 1. Supervisord restarts [program:xyz] but then xyz should not be killed and restarted as xyz is already running and script run-xyz exits, which eventually leads to start-retries getting over and [program:xyz] reaches a FATAL state and now supervisord is not monitoring [program:xyz] anymore.

I have the following questions:

  1. Is there a way for run-xyz to attach itself as the parent process id of xyz in scenarios when it finds xyz running with parent process id as 1?
  2. Can supervisord monitor the xyz pid and do a restart of [program:xyz] when xyz pid does not exist anymore. We do not care if run-xyz is running or not as it's only job is to get xyz started. [Tried the pidproxy route but did not work, will explain it below]
  3. Is supervisord the right choice for what we are trying to achieve?

Using pidproxy, the configuration became:

[program:xyz]
command         = /path/to/pidproxy /path/to/xyz.pid /path/to/run-xyz
numprocs        = 1
priority        = 100
startsecs       = 0
startretries    = 10
stopwaitsecs    = 60
exitcodes       = 0
autostart       = false
autorestart     = true
redirect_stderr = true
stdout_logfile  = /path/to/log

When using the above configuration, we removed the infinite loop in run-xyz wherein it starts up xyz, sleeps for a few seconds(30) and exits. But supervisor is not waiting and it is assuming that run-xyz has exited and again tries to spawn up and this happens until the retries are over it reaches a FATAL state. Note: /path/to/xyz.pid exists and the pid gets written in the file.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source