'Continuous WebJob restarts every 60 minutes

Our Continuous WebJob restarts every 60 minutes for no reason.

Our Config:

  1. is_singleton -> true
  2. Always on -> On
  3. Auto-Heal -> Off

The job reads from an event hub and writes the data to a blob storage every minute.

The strange thing is that the first log entry is always "Status changed to Starting" even though the job is already running.`

The log/error given by the failure.


job_log.txt

...
[04/21/2022 17:58:07 > 168694: INFO] [17:58:07 INF] 04/21/2022 17:57:00 +00:00
[04/21/2022 17:59:06 > 168694: INFO] [17:59:06 INF] 04/21/2022 17:58:00 +00:00
[04/21/2022 17:59:58 > 168694: SYS INFO] Status changed to Starting
[04/21/2022 17:59:58 > 168694: SYS INFO] WebJob singleton setting is True
[04/21/2022 17:59:58 > 168694: SYS INFO] Status changed to InactiveInstance
[04/21/2022 17:59:59 > 168694: SYS INFO] WebJob is stopping due to website shutting down
[04/21/2022 17:59:59 > 168694: SYS INFO] Status changed to Stopping
[04/21/2022 18:00:03 > 168694: SYS INFO] Status changed to InactiveInstance
[04/21/2022 18:00:04 > 168694: ERR ] Thread was being aborted.
[04/21/2022 18:00:04 > 168694: SYS INFO] WebJob process was aborted
[04/21/2022 18:00:04 > 168694: SYS INFO] Status changed to Stopped
[04/21/2022 18:00:04 > 168694: SYS INFO] WebJob singleton lock is released
[04/21/2022 18:00:08 > 168694: SYS INFO] WebJob singleton lock is acquired
[04/21/2022 18:00:09 > 168694: SYS INFO] Run script 'run.cmd' with script host - 'WindowsScriptHost'
[04/21/2022 18:00:09 > 168694: SYS INFO] Status changed to Running
[04/21/2022 18:00:10 > 168694: INFO] 
[04/21/2022 18:00:10 > 168694: INFO] C:\local\Temp\jobs\continuous\<name>\wln3ben4.tad>dotnet <name>.dll 
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Archivator Started.
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Starting EventHub...
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Now listening on: http://localhost:5000
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Application started. Press Ctrl+C to shut down.
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Hosting environment: Production
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Content root path: C:\local\Temp\jobs\continuous\<name>\wln3ben4.tad
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] EventHub Started.
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Partition 7: Initialized
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Partition 3: Initialized
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Partition 2: Initialized
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Partition 0: Initialized
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Partition 1: Initialized
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Partition 4: Initialized
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Partition 6: Initialized
[04/21/2022 18:00:10 > 168694: INFO] [18:00:10 INF] Partition 5: Initialized
[04/21/2022 18:00:14 > 168694: INFO] [18:00:14 INF] 04/21/2022 17:56:00 +00:00
[04/21/2022 18:00:16 > 168694: INFO] [18:00:16 INF] 04/21/2022 17:57:00 +00:00
[04/21/2022 18:00:16 > 168694: INFO] [18:00:16 INF] 04/21/2022 17:58:00 +00:00
[04/21/2022 18:00:16 > 168694: INFO] [18:00:16 INF] 04/21/2022 17:59:00 +00:00

One hour later:

...
[04/21/2022 18:58:06 > 168694: INFO] [18:58:06 INF] 04/21/2022 18:57:00 +00:00
[04/21/2022 18:59:05 > 168694: INFO] [18:59:05 INF] 04/21/2022 18:58:00 +00:00
[04/21/2022 18:59:58 > 168694: SYS INFO] Status changed to Starting
[04/21/2022 18:59:58 > 168694: SYS INFO] WebJob singleton setting is True
[04/21/2022 18:59:58 > 168694: SYS INFO] Status changed to InactiveInstance
[04/21/2022 19:00:01 > 168694: SYS INFO] WebJob is stopping due to website shutting down
[04/21/2022 19:00:01 > 168694: SYS INFO] Status changed to Stopping
[04/21/2022 19:00:03 > 168694: SYS INFO] Status changed to InactiveInstance
[04/21/2022 19:00:05 > 168694: INFO] [19:00:05 INF] 04/21/2022 18:59:00 +00:00
[04/21/2022 19:00:06 > 168694: ERR ] Thread was being aborted.
[04/21/2022 19:00:06 > 168694: SYS INFO] WebJob process was aborted
[04/21/2022 19:00:06 > 168694: SYS INFO] Status changed to Stopped
[04/21/2022 19:00:06 > 168694: SYS INFO] WebJob singleton lock is released
[04/21/2022 19:00:08 > 168694: SYS INFO] WebJob singleton lock is acquired
[04/21/2022 19:00:09 > 168694: SYS INFO] Run script 'run.cmd' with script host - 'WindowsScriptHost'
[04/21/2022 19:00:09 > 168694: SYS INFO] Status changed to Running
[04/21/2022 19:00:10 > 168694: INFO] 
[04/21/2022 19:00:10 > 168694: INFO] C:\local\Temp\jobs\continuous\<name>\y2nhkwrd.d4o>dotnet <name>.dll 
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Archivator Started.
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Starting EventHub...
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Now listening on: http://localhost:5000
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Application started. Press Ctrl+C to shut down.
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Hosting environment: Production
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Content root path: C:\local\Temp\jobs\continuous\<name>\y2nhkwrd.d4o
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] EventHub Started.
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Partition 7: Initialized
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Partition 3: Initialized
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Partition 0: Initialized
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Partition 5: Initialized
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Partition 4: Initialized
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Partition 2: Initialized
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Partition 1: Initialized
[04/21/2022 19:00:11 > 168694: INFO] [19:00:11 INF] Partition 6: Initialized
[04/21/2022 19:00:15 > 168694: INFO] [19:00:15 INF] 04/21/2022 18:56:00 +00:00
[04/21/2022 19:00:18 > 168694: INFO] [19:00:18 INF] 04/21/2022 18:57:00 +00:00

singleton.job.lock

{
  "OperationName": "Acquiring continuous WebJob singleton lock",
  "AcquiredDateTime": "2022-04-21T19:00:08.9552526Z",
  "StackTrace": "   at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)\r\n   at System.Environment.get_StackTrace()\r\n   at Kudu.Core.Infrastructure.LockFile.WriteLockInfo(String operationName, Stream lockStream) in C:\\Kudu Files\\Private\\src\\master\\Kudu.Core\\Infrastructure\\LockFile.cs:line 225\r\n   at Kudu.Core.Infrastructure.LockFile.Lock(String operationName) in C:\\Kudu Files\\Private\\src\\master\\Kudu.Core\\Infrastructure\\LockFile.cs:line 160\r\n   at Kudu.Core.Jobs.ContinuousJobRunner.TryGetLockIfSingleton(Boolean& acquired) in C:\\Kudu Files\\Private\\src\\master\\Kudu.Core\\Jobs\\ContinuousJobRunner.cs:line 243\r\n   at Kudu.Core.Jobs.ContinuousJobRunner.<>c__DisplayClass24_0.<StartJob>b__0() in C:\\Kudu Files\\Private\\src\\master\\Kudu.Core\\Jobs\\ContinuousJobRunner.cs:line 103\r\n   at System.Threading.ThreadHelper.ThreadStart_Context(Object state)\r\n   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)\r\n   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)\r\n   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)\r\n   at System.Threading.ThreadHelper.ThreadStart()",
  "InstanceId": "168694"
}

status_168694

{"Status":"Running"}


Solution 1:[1]

Make sure if this setting is present

   WEBJOBS_RESTART_TIME - Timeout in seconds between when a continuous job's process goes down (for any reason) and the time we re-launch it again (Only for continuous jobs).
  • Having unnecessary app settings on the configuration blade of WebJobs on the portal sometimes causes unnecessary reboots.
  • Identify and remove unnecessary app settings as required.
  • Check if you have defined any triggers on the website which restarts the website
  • On the App Service , Click Diagnose and solve problems in the left menu - look for the tile for Diagnostic Tools > "Availability and Performance" & "Best Practices." / Examine the WebJob's specifics

Refer this doc- Set a continuous job as singleton

The wait time between WebJob invocations is either WebJobsRestartTime (60 seconds by default) or if the WebJob Was running for at least 2 minutes there is no wait time.

Please refer GitHub and a code comment for more information

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 HarshithaVeeramalla-MT