'Supervisord Monitoring of grandchild process
We have a service xyz that gets started by a script run-xyz [This script has a bunch of pre-activation steps before starting up the service xyz]
My goal is to monitor the process xyz and for that reason we are using supervisord with the following configuration:
[program:xyz]
command = /path/to/run-xyz
numprocs = 1
priority = 100
startsecs = 10
startretries = 10
stopwaitsecs = 60
exitcodes = 0
autostart = false
autorestart = true
redirect_stderr = true
stdout_logfile = /path/to/log
The script run-xyz starts up the service xyz and has an infinite loop checking whether the process xyz is present or not. [For example, on a SIGKILL of the process id of xyz, it comes out of the infinite loop and the script stops running and supervisord automatically restarts [program:xyz] after which things get back to normal]
However, if the run-xyz process gets killed somehow, then the xyz service becomes an orphan with parent process id as 1. Supervisord restarts [program:xyz] but then xyz should not be killed and restarted as xyz is already running and script run-xyz exits, which eventually leads to start-retries getting over and [program:xyz] reaches a FATAL state and now supervisord is not monitoring [program:xyz] anymore.
I have the following questions:
- Is there a way for run-xyz to attach itself as the parent process id of xyz in scenarios when it finds xyz running with parent process id as 1?
- Can supervisord monitor the xyz pid and do a restart of
[program:xyz]when xyz pid does not exist anymore. We do not care if run-xyz is running or not as it's only job is to get xyz started. [Tried the pidproxy route but did not work, will explain it below] - Is supervisord the right choice for what we are trying to achieve?
Using pidproxy, the configuration became:
[program:xyz]
command = /path/to/pidproxy /path/to/xyz.pid /path/to/run-xyz
numprocs = 1
priority = 100
startsecs = 0
startretries = 10
stopwaitsecs = 60
exitcodes = 0
autostart = false
autorestart = true
redirect_stderr = true
stdout_logfile = /path/to/log
When using the above configuration, we removed the infinite loop in run-xyz wherein it starts up xyz, sleeps for a few seconds(30) and exits. But supervisor is not waiting and it is assuming that run-xyz has exited and again tries to spawn up and this happens until the retries are over it reaches a FATAL state. Note: /path/to/xyz.pid exists and the pid gets written in the file.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
