'Pulumi and GCP Uptime Checks deployments failing from time to time

We added recently GCP UptimeChecks to our pulumi stack, we create the uptime check like this

ucc, err := monitoring.NewUptimeCheckConfig(ctx, name, &monitoring.UptimeCheckConfigArgs{
        DisplayName: pulumi.String("uptime check example"),
        HttpCheck: &monitoring.UptimeCheckConfigHttpCheckArgs{
            Path:          pulumi.String(fmt.Sprintf("/%s/status", "github")),
            Port:          pulumi.Int(443),
            RequestMethod: pulumi.String("GET"),
            UseSsl:        pulumi.Bool(true),
            ValidateSsl:   pulumi.Bool(true),
        },
        MonitoredResource: &monitoring.UptimeCheckConfigMonitoredResourceArgs{
            Labels: pulumi.StringMap{
                "host": pulumi.String(targetUrl),
            },
            Type: pulumi.String("uptime_url"),
        },
        Period:  pulumi.String("60s"),
        Timeout: pulumi.String("10s"),
    })

Then I decided to add an alert policy for this uptime check

Note: here we forward the uptime check created previously

args := monitoring.AlertPolicyArgs{
        DisplayName: pulumi.String(name),
        Combiner:    pulumi.String("AND"),
        Conditions: monitoring.AlertPolicyConditionArray{
            monitoring.AlertPolicyConditionArgs{
                DisplayName: pulumi.String("Health check alerts for github %s", service.ShortName),
                ConditionThreshold: monitoring.AlertPolicyConditionConditionThresholdArgs{
                    Filter:   pulumi.Sprintf("metric.type=\"monitoring.googleapis.com/uptime_check/check_passed\" AND metric.label.check_id=\"%s\" AND resource.type=\"uptime_url\"", uptimeCheck.UptimeCheckId),
                    Duration: pulumi.String("60s"),
                    Trigger: monitoring.AlertPolicyConditionConditionThresholdTriggerArgs{
                        Count: pulumi.IntPtr(1),
                    },
                    ThresholdValue: pulumi.Float64Ptr(1),
                    Comparison:     pulumi.String("COMPARISON_LT"),
                    Aggregations: monitoring.AlertPolicyConditionConditionThresholdAggregationArray{
                        monitoring.AlertPolicyConditionConditionThresholdAggregationArgs{
                            AlignmentPeriod:  pulumi.String("60s"),
                            PerSeriesAligner: pulumi.String("ALIGN_COUNT_TRUE"),
                        },
                    },
                },
            },
        }
        NotificationChannels: "alerts", 

This worked fine in the first deployment, but the subsequent ones started to fail.

error: deleting urn:pulumi:env::company::gcp:monitoring/uptimeCheckConfig:UptimeCheckConfig::uptime-check-github: 1 error occurred:
Error when reading or editing UptimeCheckConfig: googleapi: Error 400: Request contains an invalid argument.

Observed behavior

What a noticed is the new uptime checks got created in our account, but GCP entered in some weird state where it could not delete the previous uptime check, the only way I managed to fix the stack was by deleting the old uptime checks manually.

Anyone experienced that?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source