Appears to be mostly a lot of logs in /var/log/syslog from prometheus-pushgateway
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| pushgateway: rotate logs hourly | operations/puppet | production | +30 -9 | |
| prometheus: split pushgateway logs | operations/puppet | production | +5 -0 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | tappof | T398091 Prometheus1005 out of disk on / | |||
| Open | None | T398092 Ensure pushgateway 1.11.0 avoids log spam when metric help strings are inconsistent |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2025-06-27T23:21:09Z] <cwhite> truncate /var/log/syslog on prometheus1005 T398091
Change #1164862 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):
[operations/puppet@production] prometheus: split pushgateway logs
Change #1164862 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: split pushgateway logs
Mentioned in SAL (#wikimedia-operations) [2025-06-30T07:22:02Z] <godog> bounce prometheus-pushgateway on prometheus1005 - T398091
Mentioned in SAL (#wikimedia-operations) [2025-06-30T07:41:41Z] <godog> restart prometheus-pushgateway on prometheus1005 with fresh state - T398091
Thank you for taking a look @colewhite ! I have removed pgw state ( rm /var/lib/prometheus/pushgateway.data ) and started the pgw again to clear the existing metrics. In other words new pushes won't conflict again. Of course this is not ideal and my understanding is that newer (trixie) versions of pgw did fix this logging (to be investigated)
Change #1165020 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):
[operations/puppet@production] pushgateway: rotate logs hourly
Change #1165020 merged by Tiziano Fogli:
[operations/puppet@production] pushgateway: rotate logs hourly
To avoid future issues, Pushgateway now writes logs to a separate log file, managed by a dedicated logrotate rule based on file size (maximum 1 GB) and executed hourly.