'Pulumi and GCP Uptime Checks deployments failing from time to time
We added recently GCP UptimeChecks to our pulumi stack, we create the uptime check like this
ucc, err := monitoring.NewUptimeCheckConfig(ctx, name, &monitoring.UptimeCheckConfigArgs{
DisplayName: pulumi.String("uptime check example"),
HttpCheck: &monitoring.UptimeCheckConfigHttpCheckArgs{
Path: pulumi.String(fmt.Sprintf("/%s/status", "github")),
Port: pulumi.Int(443),
RequestMethod: pulumi.String("GET"),
UseSsl: pulumi.Bool(true),
ValidateSsl: pulumi.Bool(true),
},
MonitoredResource: &monitoring.UptimeCheckConfigMonitoredResourceArgs{
Labels: pulumi.StringMap{
"host": pulumi.String(targetUrl),
},
Type: pulumi.String("uptime_url"),
},
Period: pulumi.String("60s"),
Timeout: pulumi.String("10s"),
})
Then I decided to add an alert policy for this uptime check
Note: here we forward the uptime check created previously
args := monitoring.AlertPolicyArgs{
DisplayName: pulumi.String(name),
Combiner: pulumi.String("AND"),
Conditions: monitoring.AlertPolicyConditionArray{
monitoring.AlertPolicyConditionArgs{
DisplayName: pulumi.String("Health check alerts for github %s", service.ShortName),
ConditionThreshold: monitoring.AlertPolicyConditionConditionThresholdArgs{
Filter: pulumi.Sprintf("metric.type=\"monitoring.googleapis.com/uptime_check/check_passed\" AND metric.label.check_id=\"%s\" AND resource.type=\"uptime_url\"", uptimeCheck.UptimeCheckId),
Duration: pulumi.String("60s"),
Trigger: monitoring.AlertPolicyConditionConditionThresholdTriggerArgs{
Count: pulumi.IntPtr(1),
},
ThresholdValue: pulumi.Float64Ptr(1),
Comparison: pulumi.String("COMPARISON_LT"),
Aggregations: monitoring.AlertPolicyConditionConditionThresholdAggregationArray{
monitoring.AlertPolicyConditionConditionThresholdAggregationArgs{
AlignmentPeriod: pulumi.String("60s"),
PerSeriesAligner: pulumi.String("ALIGN_COUNT_TRUE"),
},
},
},
},
}
NotificationChannels: "alerts",
This worked fine in the first deployment, but the subsequent ones started to fail.
error: deleting urn:pulumi:env::company::gcp:monitoring/uptimeCheckConfig:UptimeCheckConfig::uptime-check-github: 1 error occurred:
Error when reading or editing UptimeCheckConfig: googleapi: Error 400: Request contains an invalid argument.
Observed behavior
What a noticed is the new uptime checks got created in our account, but GCP entered in some weird state where it could not delete the previous uptime check, the only way I managed to fix the stack was by deleting the old uptime checks manually.
Anyone experienced that?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
