Category "monitoring"

Alerts firing on Prometheus but not on Alertmanager

I can't seem to find out why Alertmanager is not getting alerts from Prometheus. I would appreciate a swift assistance on this challenge. I'm fairly new with us

Spark History server not listing completed jars

I'm running Spark standalone jobs in Windows. I would like to monitor my Spark jobs using the spark history server. I have launched spark history server with be

How to calculate the instant rate of a gauge metric?

How can I calculate the per-second instant rate of increase of the time series in Prometheus or Grafana without using rate() or irate()? This drive function is

How to capture all request logs from OPA server

I'm using Open Policy Agent (https://www.openpolicyagent.org/docs/latest/). I'm using it in kubernetes. I have various user traffic going on. I have such contai

How to Monitor health of AWS cloudwatch agents?

I have hundreds of EC2 servers runnin cloudwatch agents in them. I would like to have a monitoring mechanism to monitor hralthcheck of cloudwatch agents. And sh

How To Reduce Prometheus(Federation) Scrape Duration

I have a Prometheus federation with 2 prometheus' servers - one per Kubernetes cluster and a central to rule them all. Over time the scrape durations increase.

Snowflake Snowpipe - Email Alert Mechanism

I am planning to use Snowpipe to load data from Kafka, but the support team monitoring the pipe jobs needs an alert mechanism. How can I implement an alert mech

Prometheus increase not handling process restarts

I am trying to figure out the behavior of Prometheus' increase() querying function with process restarts. When there is a process restart within a 2m interval

Provisioning with Ansible and Vagrant multiple vagrantfiles

I'm creating a monitoring environment that has monitoring_servers and monitored_boxes, and of course Ansible controller. For testing roles etc I've created a n

Compare 2 metrics in Prometheus

I'm new in prometheus, need help. I have custom metric on my 2 servers (just shows version of application): app_version{plant="dev",env="demo"} 55.119 app_ver

how to query sar(sysstat) for more than one day data points

I don't see 'sar' command accepts date-and-time as starttime(-s) or endtime(-e) than just time. So, how to query 'sar' for more than one day's data points by pa

Prometheus create label from metric label

We are running the node-exporter in containers. To quickly identify on which host each node-exporter is running, I created a metric that looks like this: host{h

How to run Prometheus monitoring with 1GB of RAM?

Having tried all kinds of flags to avoid keeping much data on the RAM has not succeeded. Even if I restrict "storage.local.memory-chunks" and also "chunks to pe

Silence prometheus alerts based on label value / Ignore alerts from label

tl;dr I have a label in prometheus called "ignore" with value "yes": metric_test{label1="label1",ignore="yes"} 1 I want to disable alerts for any metrics with