Page MenuHomePhabricator

Add monitoring for planet updating
Closed, ResolvedPublic

Description

Follow-up from T203055: en.planet hasn't updated since July 25.

We should have a monitoring check that the planet cron job is running properly. I suggested having a check that a header exists for $day - 1, since there's always at least one post each day. If the cron job fails, it might take us a full day to notice but I think that's acceptable for now.

Event Timeline

Legoktm created this task.Aug 31 2018, 2:11 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 31 2018, 2:11 AM
Dzahn claimed this task.Aug 31 2018, 1:23 PM

Change 472713 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] icinga/planet: add plugin to check planet content updates

https://gerrit.wikimedia.org/r/472713

Change 472713 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] icinga/planet: add generic check_lastmod plugin and check planet updates

https://gerrit.wikimedia.org/r/472713

Change 472713 merged by Dzahn:
[operations/puppet@production] icinga/planet: add generic check_lastmod plugin and check planet updates

https://gerrit.wikimedia.org/r/472713

Change 498807 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] nagios_common: add check command check_lastmod

https://gerrit.wikimedia.org/r/498807

Change 498807 merged by Dzahn:
[operations/puppet@production] nagios_common: add check command check_lastmod

https://gerrit.wikimedia.org/r/498807

Change 498810 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] nagios_common: add command config for lastmod plugin

https://gerrit.wikimedia.org/r/498810

Change 498810 merged by Dzahn:
[operations/puppet@production] nagios_common: add command config for lastmod plugin

https://gerrit.wikimedia.org/r/498810

Change 498819 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] nagios_common: another fix to check_lastmod location and naming

https://gerrit.wikimedia.org/r/498819

Change 498819 merged by Dzahn:
[operations/puppet@production] nagios_common: another fix to check_lastmod location and naming

https://gerrit.wikimedia.org/r/498819

Change 498835 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] icinga/planet: merge monitoring into a single host and class

https://gerrit.wikimedia.org/r/498835

Change 498835 merged by Dzahn:
[operations/puppet@production] icinga/planet: merge monitoring into a single host and class

https://gerrit.wikimedia.org/r/498835

Mentioned in SAL (#wikimedia-operations) [2019-03-25T13:28:40Z] <mutante> planet - manually updating en version since new monitoring check warned it wasn't current (T203208)

Change 498865 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] icinga/planet: fix seconds->hour calc and add comments

https://gerrit.wikimedia.org/r/498865

Change 498865 merged by Dzahn:
[operations/puppet@production] icinga/planet: fix seconds->hour calc and add comments

https://gerrit.wikimedia.org/r/498865

Dzahn closed this task as Resolved.Mar 25 2019, 2:19 PM
10:17 <+icinga-wm> RECOVERY - check updates on en.planet.wikimedia.org on en.planet.wikimedia.org is OK: OK - Website content is current (70796 = 86400) https://wikitech.wikimedia.org/wiki/Planet.wikimedia.org