'monit fault tolerance cycle for a service
I'm playing with monit to see what it can do. I found that "cycle" appears in a lot of places. And I try to understand whether the time unit of one cycle the same for all services, or it is dependent on how the service's schedule is defined. I have the following example, which seems to be difficult to solve if the cycle length is a global variable shared by all services.
Assume there is program to be run once an hour. I wanted to get notified if I get two seccesive failures (status = 1).
Assume additionally I have set daemon 30, then there is a global cycle period 30 seconds.
I then defined the service as
check program my-check path /path/to/program every 120 cycles
if status == 1 for 2 times within xxx cycles then alert
- Then what should the
xxxbe? Should it be 120 cycles or 2 cycles (or even 121 cycles)? - I noticed that Monit only allow the cycle number to be within 1-64. What should I do to mean 2 succesive times here.
- Please do not suggest to increase the cycle period, unless there is a way to achieve different
set daemon nfor different services. Some other services need this 30 seconds interval.
Solution 1:[1]
You are right, the base interval is 30 secound. If you use "every 20 cycles" the new interval became 10 minutes (20 * 30s). And now, "xxx" based on the new interval time. But to count to "2 times", "xxx" should be 3 (30 minutes) or more.
To get some more samples see https://mmonit.com/monit/documentation/monit.html
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | lutzmad |
