'What's the difference between `retry_after` and `timeout` options for Laravel queues?

I came across an issue in my code base where a job was being terminated early while its timeout was set to 540 seconds. However after looking in the documentation I noticed it was an issue with the retry_after parameter, which I've now increased to 600 seconds, which fixes the issue. The documentation https://laravel.com/docs/7.x/queues has a block in the documentation stating:

The --timeout value should always be at least several seconds shorter than your retry_after configuration value. This will ensure that a worker processing a given job is always killed before the job is retried. If your --timeout option is longer than your retry_after configuration value, your jobs may be processed twice.

However, I've read the documentation a couple of times now but I can't paraphase what the difference is between the retry_after and timeout options. --timeout seems to be a Job-related setting while retry_after seems to be a worker-related (process) setting. Also, --timeout can be passed as an argument to php artisan queue:work ..., but retry_after is a configuration property to be defined in config/queue.php.

Anyone has some experience with this configuration and can state the difference with an example?



Solution 1:[1]

my understanding is as follows.

When the worker decides it will start a job, it puts a timestamp in the reserved_at column of the job. The value of this reservation is based on the retry_after value. This field being in the future, tells other workers that the job is in process, despite still being in the jobs table.

If the worker itself crashes then the job remains reserved in the jobs table, so the reserved_at is a timestamp rather than a simple boolean so that workers know when it is no longer to be trusted.

Once the reserved_at time is in the past then a worker can retry the job (assuming there are tries remaining)

The timeout applies at both the queue level and the individual job level. Although not stated in Mohamed Said' book I'm assuming that the timeout is the highest of the queue timeout or the timeout in the job itself.

So, if the timeout on the queue is 60 seconds, an individual and lengthy job can be given a longer timeout via its public timeout property.

Whichever way the timeout is defined, if the job does not complete in the timeout allowed then it is forcibly terminated by the worker.

If the timeout is longer than the retry_after then the first instance of the job could still be executing when the reserved_at time expires. This then allows another worker to initiate a second instance of the job, leading to unpredictable results.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Snapey