Page MenuHomePhabricator

PuppetFailure - gerrit2003
Closed, ResolvedPublic

Description

Common information

  • alertname: PuppetFailure
  • cluster: misc
  • instance: gerrit2003:9100
  • job: node
  • prometheus: ops
  • severity: critical
  • site: codfw
  • source: prometheus
  • team: collaboration-services

Firing alerts


Event Timeline

Dzahn renamed this task from PuppetFailure to PuppetFailure - gerrit2003.Sep 17 2024, 2:09 PM
Dzahn claimed this task.
Dzahn subscribed.

Caused by testing of the gerrit role on bookwork. Host not in production yet.

This should be fixed by https://gerrit.wikimedia.org/r/c/operations/puppet/+/1073308

Waiting for that to be merged.

Unclear how the issue was fixed on current prod host. Probably with a manual command copying the image around.

Dzahn triaged this task as Low priority.Sep 17 2024, 2:19 PM

puppet is not failing anymore. Apache is, due to a cert issue, but not puppet.

Apache also fixed and running.

No more puppet errors or webserver issues here:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1074498