'monit fault tolerance cycle for a service

I'm playing with monit to see what it can do. I found that "cycle" appears in a lot of places. And I try to understand whether the time unit of one cycle the same for all services, or it is dependent on how the service's schedule is defined. I have the following example, which seems to be difficult to solve if the cycle length is a global variable shared by all services.

Assume there is program to be run once an hour. I wanted to get notified if I get two seccesive failures (status = 1).

Assume additionally I have set daemon 30, then there is a global cycle period 30 seconds.

I then defined the service as

check program my-check path /path/to/program every 120 cycles
    if status == 1 for 2 times within xxx cycles then alert
  • Then what should the xxx be? Should it be 120 cycles or 2 cycles (or even 121 cycles)?
  • I noticed that Monit only allow the cycle number to be within 1-64. What should I do to mean 2 succesive times here.
  • Please do not suggest to increase the cycle period, unless there is a way to achieve different set daemon n for different services. Some other services need this 30 seconds interval.


Solution 1:[1]

You are right, the base interval is 30 secound. If you use "every 20 cycles" the new interval became 10 minutes (20 * 30s). And now, "xxx" based on the new interval time. But to count to "2 times", "xxx" should be 3 (30 minutes) or more.

To get some more samples see https://mmonit.com/monit/documentation/monit.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 lutzmad