Page MenuHomePhabricator

Upgrade blackbox-exporter and reduce logging
Closed, ResolvedPublic

Description

blackbox-exporter 0.25 (already in Debian testing/unstable) adds a --log.prober flag to increase logging level of probes only. This is useful to us because we want errors from probes which now are logged at debug level only:

2025-01-29T11:10:18.262467+00:00 filippo-prometheus-01 prometheus-blackbox-exporter[3153432]: ts=2025-01-29T11:10:18.262Z caller=handler.go:184 module=ssh_banner target=kafka-logging1003.mgmt.eqiad.wmnet:22 level=debug msg="Error dialing TCP" err="dial tcp4 10.65.0.207:22: i/o timeout"                                                                                                                                                                                                2025-01-29T11:10:18.262619+00:00 filippo-prometheus-01 prometheus-blackbox-exporter[3153432]: ts=2025-01-29T11:10:18.262Z caller=handler.go:184 module=ssh_banner target=kafka-logging1003.mgmt.eqiad.wmnet:22 level=debug msg="Probe failed" duration_seconds=5.000793578

Whereas with that change we can do --log.prober error and have probes error reported at error level

2025-01-29T11:23:46.668788+00:00 filippo-prometheus-01 prometheus-blackbox-exporter[2677683]: ts=2025-01-29T11:23:46.668Z caller=level.go:71 module=ssh_banner target=db1170.mgmt.eqiad.wmnet:22 level=error msg="Error dialing TCP" err="dial tcp4 10.65.0.182:22: i/o timeout"
2025-01-29T11:23:46.669044+00:00 filippo-prometheus-01 prometheus-blackbox-exporter[2677683]: ts=2025-01-29T11:23:46.668Z caller=level.go:71 module=ssh_banner target=db1170.mgmt.eqiad.wmnet:22 level=error msg="Probe failed" duration_seconds=5.000530174

Implementation steps:

  • import prometheus-blackbox-exporter 0.25.0-2 package from testing into our internal bookworm repo (possibly other releases too? depends on the step below)
  • audit which hosts use blackbox-exporter and upgrade it to the version above. both cloud and production
  • change exporter flags in modules/prometheus/templates/initscripts/prometheus-blackbox-exporter.systemd_override.erb with the above --log.prober error

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2025-05-09T12:51:33Z] <godog> upload prometheus-blackbox-exporter 0.26.0-0~bpo12+1 to bookworm-wikimedia - T385022

Change #1143810 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] prometheus: move blackbox-exporter to log prober errors

https://gerrit.wikimedia.org/r/1143810

Change #1143810 merged by Filippo Giunchedi:

[operations/puppet@production] prometheus: move blackbox-exporter to log prober errors

https://gerrit.wikimedia.org/r/1143810

fgiunchedi claimed this task.

This is complete, resolving