Page MenuHomePhabricator

Monitor the up-to-date status of wikitech-static
Closed, ResolvedPublic

Description

Occasionally one of the jobs that updates wikitech-static breaks, and the static site gets out of sync with the actual wikitech. We need to detect and report this, somehow -- often it stays broken for weeks before I notice.

I can think of a few solutions but nothing that I love.

For instance, a cron on the wikitech host could make a daily edit that adds a date stamp to a pre-arranged page, and then shinken could check that same page on wikitech-static and confirm that it contains yesterday's date. Is there a better way?

Event Timeline

Andrew raised the priority of this task from to Medium.
Andrew updated the task description. (Show Details)
Andrew subscribed.

maybe just check the mtime of one of the files in the filesystem, to make sure files have been written on the remote host without having to go through actual wiki. there's check_file_age on neon in the nagios plugin dir.

or, does the job write logs? could also use check_log to detect an error pattern there and be notified when the job fails

how does wikitech-static generation work? is there a job on wikitech that generates a static copy to be pushed to an external host or it gets pulled from the external host?

@fgiunchedi a daily cron runs on wikitech which wikitech-static fetches. Details here: https://wikitech.wikimedia.org/wiki/Wikitech-static

I propose that we have a monitoring test which uses api calls to compare the most recent edit date of

https://wikitech.wikimedia.org/wiki/Server_Admin_Log

with

https://wikitech-static.wikimedia.org/wiki/Server_Admin_Log

And alerts if they are more than 25 hours different. That's easy, I think?

ack, thanks! yep comparing the two sound easy enough

compare the most recent edit date
And alerts if they are more than 25 hours different

#!/bin/bash

API_QUERY="action=query&titles=Server_Admin_Log&list=recentchanges&format=xml"
WIKITECH="https://wikitech.wikimedia.org/w/api.php"
WIKITECHSTATIC="https://wikitech-static.wikimedia.org/w/api.php"

TS_W=$(curl $WIKITECH/?$API_QUERY 2> /dev/null | grep -o "timestamp.*" | cut -d\" -f2)
TS_S=$(curl $WIKITECHSTATIC/?$API_QUERY 2> /dev/null | grep -o "timestamp.*" | cut -d\" -f2)

echo "W: $TS_W"
echo "S: $TS_S"

W: 2015-05-07T02:43:26Z
S: 2015-04-30T01:18:12Z

Change 210637 had a related patch set uploaded (by Dzahn):
nagios plugin checks if wikitech-static is in sync

https://gerrit.wikimedia.org/r/210637

Change 210637 merged by Dzahn:
nagios plugin checks if wikitech-static is in sync

https://gerrit.wikimedia.org/r/210637

Change 210638 had a related patch set uploaded (by Dzahn):
wikitech: add monitoring::service for static sync

https://gerrit.wikimedia.org/r/210638

Change 210638 merged by Andrew Bogott:
wikitech: add monitoring::service for static sync

https://gerrit.wikimedia.org/r/210638

I guess now we should break it on purpose to test the test?

Change 210721 had a related patch set uploaded (by Dzahn):
wikitech-static monitoring, ? is illegal character

https://gerrit.wikimedia.org/r/210721

Change 210721 merged by Dzahn:
wikitech-static monitoring, ? is illegal character

https://gerrit.wikimedia.org/r/210721

Change 210724 had a related patch set uploaded (by Dzahn):
wikitech-static monitoring, add check command

https://gerrit.wikimedia.org/r/210724

Change 210730 had a related patch set uploaded (by Dzahn):
wikitech-static monitoring, install plugin

https://gerrit.wikimedia.org/r/210730

Change 210724 merged by Dzahn:
wikitech-static monitoring, add check command

https://gerrit.wikimedia.org/r/210724

Change 210730 merged by Dzahn:
wikitech-static monitoring, install plugin

https://gerrit.wikimedia.org/r/210730

Change 210739 had a related patch set uploaded (by Dzahn):
wikitech-static monitoring, fix template name

https://gerrit.wikimedia.org/r/210739

Change 210739 merged by Dzahn:
wikitech-static monitoring,fix check command setup

https://gerrit.wikimedia.org/r/210739