The icinga puppet test were recently updated so that they would only go into a warning state on a puppet failure and we instead only send a critical alert of a percentage of all hosts go into a failed state. however this has caused us to miss failing hosts which do a specific role and therefore don't trigger the percentage required. As such it would be useful to also go into an alerting state if puppet hasn't run for an extended peroiod of time. e.g. 24 hours
Description
Description
Details
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
check_puppetrun: don't alert for disabled puppet agents for 1 day | operations/puppet | production | +8 -14 |
Related Objects
Related Objects
Event Timeline
Comment Actions
Change 546165 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] check_puppetrun: dont alert for diabled puppet agents for 1 day
Comment Actions
I toko a look at how the mod gets the last puppet run data and iut just dose the following stat -c %Z /var/lib/puppet/state/classes.txt which isn't really usefull for our use case. I dont see any other meta data which gives the last failed state so we may need to store it our self
Comment Actions
Change 546165 merged by Jbond:
[operations/puppet@production] check_puppetrun: don't alert for disabled puppet agents for 1 day