Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.7k views
in Technique[技术] by (71.8m points)

kubernetes - prometheus-blackbox-exporter is Firing false positive alerts

We have set up full Prometheus stack - Prometheus/Grafana/Alertmanager/Node Explorer/Blackbox exporter using community helm charts in our Kubernetes cluster. Monitoring stack is deployed in its own namespace and our main software, comprised of microservices is deployed in the default namespace. Alerting is operating fine however blackbox exporter is not scraping correctly metrics (I guess) and FIRING regularly false positive alerts. We use the last for probing our microservices HTTP liveness/readiness endpoints.

My configuration (in values.yaml) related to the issue looks like:

- alert: InstanceDown
           expr: up == 0
           for: 5m
           annotations:
             title: 'Instance {{ $labels.instance }} down'
             description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
- alert: ExporterIsDown
           expr: up{job="prometheus-blackbox-exporter"} == 0
           for: 5m
           labels:
             severity: warning
           annotations:
             summary: "Blackbox exporter is down"
             description: "Blackbox exporter is down or not being scraped correctly"
...
...
...
extraScrapeConfigs:  |
   - job_name: 'prometheus-blackbox-exporter'
     metrics_path: /probe
     params:
       module: [http_2xx]
     static_configs:
       - targets:
         - http://service1.default.svc.cluster.local:8082/actuator/health/liveness
         - http://service2.default.svc.cluster.local:8081/actuator/health/liveness
         - http://service3.default.svc.cluster.local:8080/actuator/health/liveness
     relabel_configs:
       - source_labels: [__address__]
         target_label: __param_target
       - source_labels: [__param_target]
         target_label: instance
       - target_label: __address__
         replacement: prometheus-blackbox-exporter:9115

These 2 alerts are firing on every hour but at that time endpoints are 100% reachable.

We're using the default prometheus-blackbox-exporter/values.yaml file:

config:
  modules:
    http_2xx:
      prober: http
      timeout: 5s
      http:
        valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
        no_follow_redirects: false
        preferred_ip_protocol: "ip4"

Mails accordingly look this way:

5] Firing
Labels
alertname = InstanceDown
instance = http://service1.default.svc.cluster.local:8082/actuator/health/liveness
job = prometheus-blackbox-exporter
severity = critical

another type of email

Labels
alertname = ExporterIsDown
instance = http://service1.default.svc.cluster.local:8082/actuator/health/liveness
job = prometheus-blackbox-exporter
severity = warning
Annotations
description = Blackbox exporter is down or not being scraped correctly
summary = Blackbox exporter is down

Another odd thing I noticed is that in Prometheus UI I don't see any probe_* metrics as shown here https://lapee79.github.io/en/article/monitoring-http-using-blackbox-exporter/ Not sure what we are doing wrong or missing to do but it's very annoying to get hundreds of false positive emails.

question from:https://stackoverflow.com/questions/65840967/prometheus-blackbox-exporter-is-firing-false-positive-alerts

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...