'Alertmanager not sending out alert to slack

I configured alertmanager with prometheus and according to the prometheus interface the alerts are firing. However there is not slack message showing up and I am wondering if maybe ufw needs to be configured or if there is any other config I missed.

The alertmanager service is running and prometheus shows "firing".

Here are my config files:

alertmanager.yml:

global:
 slack_api_url: 'https://hooks.slack.com/services/my_id_removed...'
route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'slack_general'
receivers:
#- name: 'web.hook'
#  webhook_configs:
#  - url: 'http://127.0.0.1:5001/'
- name: slack_general
  slack_configs:
  - channel: '#alerts'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

prometheus.yml:

global:
  scrape_interval: 10s
  evaluation_interval: 15s # Evaluates rules every 15s. Default is 1m
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']
rule_files:
  - rules.yml
scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090', 'localhost:9104']

  - job_name: 'node_exporter_metrics'
    scrape_interval: 5s
    static_configs:
      - targets: ['leo:9100', 'dog:9100']

  - job_name: 'alertmanager'
    static_configs:
      - targets: ['localhost:9093']

alerts.yml:

groups:
 - name: test
   rules:
   - alert: InstanceDown
     expr: up == 0
     for: 1m
   - alert: HostHighCpuLoad
     expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
     for: 0m
     labels:
       severity: warning
     annotations:
       summary: Host high CPU load (instance {{ $labels.instance }})
       description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
   - alert: HostOutOfMemory
     expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
     for: 2m
     labels:
       severity: warning
     annotations:
       summary: Host out of memory (instance {{ $labels.instance }})
       description: "Node memory is filling up (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
   # Please add ignored mountpoints in node_exporter parameters like
   # "--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|run)($|/)".
   # Same rule using "node_filesystem_free_bytes" will fire when disk fills for non-root users.
   - alert: HostOutOfDiskSpace
     expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 8 and ON (instance, device, mountpoint) node_filesystem_readonly == 0
     for: 2m
     labels:
       severity: warning
     annotations:
       summary: Host out of disk space (instance {{ $labels.instance }})
       description: "Disk is almost full (< 8% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

enter image description here

 sudo systemctl status alertmanager.service
● alertmanager.service - Alertmanager for prometheus
     Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2022-03-05 22:02:31 CET; 6min ago
   Main PID: 9398 (alertmanager)
      Tasks: 30 (limit: 154409)
     Memory: 21.8M
     CGroup: /system.slice/alertmanager.service
             └─9398 /opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml --storage.path=/opt/alertmanager/data

Mar 05 22:02:31 leo alertmanager[9398]: level=info ts=2022-03-05T21:02:31.094Z caller=main.go:225 msg="Starting Alertmanager" version="(version=0.23.0, branch=HEAD, revision=61046b17771a57cfd4>
Mar 05 22:02:31 leo alertmanager[9398]: level=info ts=2022-03-05T21:02:31.094Z caller=main.go:226 build_context="(go=go1.16.7, user=root@e21a959be8d2, date=20210825-10:48:55)"
Mar 05 22:02:31 leo alertmanager[9398]: level=info ts=2022-03-05T21:02:31.098Z caller=cluster.go:184 component=cluster msg="setting advertise address explicitly" addr=192.168.0.2 port=9094
Mar 05 22:02:31 leo alertmanager[9398]: level=info ts=2022-03-05T21:02:31.099Z caller=cluster.go:671 component=cluster msg="Waiting for gossip to settle..." interval=2s
Mar 05 22:02:31 leo alertmanager[9398]: level=info ts=2022-03-05T21:02:31.127Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/opt/alertmanager/alertma>
Mar 05 22:02:31 leo alertmanager[9398]: level=info ts=2022-03-05T21:02:31.128Z caller=coordinator.go:126 component=configuration msg="Completed loading of configuration file" file=/opt/alertma>
Mar 05 22:02:31 leo alertmanager[9398]: level=info ts=2022-03-05T21:02:31.131Z caller=main.go:518 msg=Listening address=:9093
Mar 05 22:02:31 leo alertmanager[9398]: level=info ts=2022-03-05T21:02:31.131Z caller=tls_config.go:191 msg="TLS is disabled." http2=false
Mar 05 22:02:33 leo alertmanager[9398]: level=info ts=2022-03-05T21:02:33.099Z caller=cluster.go:696 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000022377s
Mar 05 22:02:41 leo alertmanager[9398]: level=info ts=2022-03-05T21:02:41.101Z caller=cluster.go:688 component=cluster msg="gossip settled; proceeding" elapsed=10.002298352s


Solution 1:[1]

I believe to have found the reason for alerts comming in sometimes and sometimes not. The alertmanager failed to autostart after reboot since it did not wait for prometheus.

This fixed it:

sudo nano /etc/systemd/system/alertmanager.service

Add "wants" and "after":

[Unit]
Description=Alertmanager for prometheus
Wants=network-online.target
After=network-online.target

Reboot.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 merlin