'Lots of "Uncaught signal: 6" errors in Cloud Run

I have a Python (3.x) webservice deployed in GCP. Everytime Cloud Run is shutting down instances, most noticeably after a big load spike, I get many logs like these Uncaught signal: 6, pid=6, tid=6, fault_addr=0. together with [CRITICAL] WORKER TIMEOUT (pid:6) They are always signal 6.

The service is using FastAPI and Gunicorn running in a Docker with this start command

CMD gunicorn -w 2 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8080 app.__main__:app

The service is deployed using Terraform with 1 gig of ram, 2 cpu's and the timeout is set to 2 minutes

resource "google_cloud_run_service" <ressource-name> {
  name     = <name>
  location = <location>

  template {
    spec {
      service_account_name = <sa-email>
      timeout_seconds = 120
      containers {
        image = var.image
        env {
          name = "GCP_PROJECT"
          value = var.project
        }
        env {
          name = "BRANCH_NAME"
          value = var.branch
        }
        resources {
          limits = {
            cpu = "2000m"
            memory = "1Gi"
          }
        }
      }
    }
  }
  autogenerate_revision_name = true
}

I have already tried tweaking the resources and timeout in Cloud Run, using the --timeout and --preload flag for gunicorn as that is what people always seem to recommend when googling the problem but all without success. I also dont exactly know why the workers are timing out.



Solution 1:[1]

Extending on the top answer which is correct, You are using GUnicorn which is a process manager that manages Uvicorn processes which runs the actual app.

When Cloudrun wants to shutdown the instance (due to lack of requests probably) it will send a signal 6 to process 1. However, GUnicorn occupies this process as the manager and will not pass it to the Uvicorn workers for handling - thus you receive the Unhandled signal 6.

The simplest solution, is to run Uvicorn directly instead of through GUnicorn (possibly with a smaller instance) and allow the scaling part to be handled via Cloudrun.

CMD ["uvicorn", "app.__main__:app", "--host", "0.0.0.0", "--port", "8080"]

Solution 2:[2]

This error happens when a background process is aborted. There are some advantages of running background threads on cloud just like for other applications. Luckily, you can still use them on Cloud Run without processes getting aborted. To do so, when deploying, chose the option "CPU always allocated" instead of "CPU only allocated during request processing"

For more details, check https://cloud.google.com/run/docs/configuring/cpu-allocation

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Or Zilberman
Solution 2 Koffi Koudjonou Thierry