blackbox-exporter 0.25 (already in Debian testing/unstable) adds a --log.prober flag to increase logging level of probes only. This is useful to us because we want errors from probes which now are logged at debug level only:
2025-01-29T11:10:18.262467+00:00 filippo-prometheus-01 prometheus-blackbox-exporter[3153432]: ts=2025-01-29T11:10:18.262Z caller=handler.go:184 module=ssh_banner target=kafka-logging1003.mgmt.eqiad.wmnet:22 level=debug msg="Error dialing TCP" err="dial tcp4 10.65.0.207:22: i/o timeout" 2025-01-29T11:10:18.262619+00:00 filippo-prometheus-01 prometheus-blackbox-exporter[3153432]: ts=2025-01-29T11:10:18.262Z caller=handler.go:184 module=ssh_banner target=kafka-logging1003.mgmt.eqiad.wmnet:22 level=debug msg="Probe failed" duration_seconds=5.000793578
Whereas with that change we can do --log.prober error and have probes error reported at error level
2025-01-29T11:23:46.668788+00:00 filippo-prometheus-01 prometheus-blackbox-exporter[2677683]: ts=2025-01-29T11:23:46.668Z caller=level.go:71 module=ssh_banner target=db1170.mgmt.eqiad.wmnet:22 level=error msg="Error dialing TCP" err="dial tcp4 10.65.0.182:22: i/o timeout" 2025-01-29T11:23:46.669044+00:00 filippo-prometheus-01 prometheus-blackbox-exporter[2677683]: ts=2025-01-29T11:23:46.668Z caller=level.go:71 module=ssh_banner target=db1170.mgmt.eqiad.wmnet:22 level=error msg="Probe failed" duration_seconds=5.000530174
Implementation steps:
- import prometheus-blackbox-exporter 0.25.0-2 package from testing into our internal bookworm repo (possibly other releases too? depends on the step below)
- audit which hosts use blackbox-exporter and upgrade it to the version above. both cloud and production
- change exporter flags in modules/prometheus/templates/initscripts/prometheus-blackbox-exporter.systemd_override.erb with the above --log.prober error