Page MenuHomePhabricator

puppet last_run_summary.yaml incoherent when catalog can't compile
Closed, ResolvedPublic

Description

Lots of monitoring and nagging &c depend on reading puppet's last_run_summary.yaml report. I have some VMs which produce a catalog failure; that yields a summary like this:

version:
  config:
  puppet: 5.5.10
resources:
  changed: 0
  corrective_change: 0
  failed: 0
  failed_to_restart: 0
  out_of_sync: 0
  restarted: 0
  scheduled: 0
  skipped: 0
  total: 0
time:
  fact_generation: 0.730059526860714
  node_retrieval: 0.34552243584766984
  plugin_sync: 0.6295351781882346
  total: 3.800249733
  last_run: 1603912853
changes:
  total: 0
events:
  failure: 0
  success: 0
  total: 0

It marks the 'last run' timestamp on each run even when the catalog fails, which I'm pretty sure is a change in behavior.

Event Timeline

Change 637510 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] puppet_alert.py: don't rely on last_run_summary.yaml for last success timestamp

https://gerrit.wikimedia.org/r/637510

Change 637510 merged by Andrew Bogott:
[operations/puppet@production] puppet_alert.py: don't rely on last_run_summary.yaml for last success timestamp

https://gerrit.wikimedia.org/r/637510

I've worked around the one specific case of this that was bothering me; I'm not going to dig into historical changes in puppet yaml output today :)