'Unable to connect to basic Google Cloud Run service: upstream connect error or disconnect/reset before headers. reset reason: remote reset error

I've been successfully running a gRPC service on GCP Cloud Run for over a year. Suddenly, it stopped working and responds to each request with...

StatusCode="Unavailable", Detail="upstream connect error or disconnect/reset before headers. reset reason: remote reset"

There was no new revision or deployment. It just started responding this way out of the blue. There is no proxy, no VPC, no gateway, no ingress controller, I'm just using the URL provided by Cloud Run with port 443 specified. It's the simplest deploy possible.

I've tried disabling end to end HTTP/2 (which worked previously), create a brand new service instance with a new name, change runtime environment generations, all haven't moved me any closer to a resolution. I have not migrated to using ESPv2, so this should not be a concern either.

What could possibly be causing this?



Solution 1:[1]

I've previously had this exact same issue you're describing - spent hours debugging/redeploying etc. I verified the GRPC server was returning successfully, and went down a rabbit-hole of .net core's handling of http2 cleartext and tls negotiation/downgrading (since cloudrun terminates TLS and .net core GRPC seems to hate unencrypted HTTP2 payloads) - this led me mostly nowhere and fixed nothing.

At the end of it all, I came back the next day - redeployed some old revisions (that were previously broken) and it all worked.

My assumption is something is going on the cloudrun side of things... but not sure.

(Obviously not a good answer, but don't have the reputation to comment).

Solution 2:[2]

When a new connection is opened, Cloud Run checks if it has content-type=application/grpc h2 header in the first request. Once Cloud Run matches gRPC, it sends all the connections to the gRPC server.

Sometimes, disabling the Cloud Run http/2 end-to-end (which you mentioned you already tried) resolves the issue as suggested in the ServerFault case.

The error which you are receiving mostly seems to indicate that ESPv2 can't reach the service's backend.

As a workaround, I suggest you the below ways to mitigate the error.

  1. Configure ESPv2 with the correct backend address as suggested here.

  2. In case if ESPv2 is already configured, force it to use IPv4 addresses via the --backend_dns_lookup_family_flag. You can check more details under the “DNS lookup” section in this documentation.

  3. Configure your requests to be gRPC requests.

Also, have a look at this GitHub Link.

Solution 3:[3]

#NotAnAnswer

Here is a link to the incident details

https://status.cloud.google.com/incidents/qfgJm8m4WPn2Ej2Z7vc2

It started on Feb 10th!

Solution 4:[4]

I don't know for sure if you were having the same issue as I have but this error tend to happen since 04/15/2021 (or so) on a .NET GRPC or HTTP/2 server.

Here is the answer I wrote on my own question on stackoverflow: https://stackoverflow.com/a/71249250/4829094. You'll find there support for Google Cloud Platforms answers and fix roadmap.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Michael Lentin
Solution 2 Mousumi Roy
Solution 3 Craig
Solution 4 Vince.Bdn