Page MenuHomePhabricator

Improve documentation about Toolforge checker services
Closed, ResolvedPublic

Description

Document all services running on tools-checker-* nodes.

Explain how the checks are made, how they are made available to monitoring and how Icinga uses them.

Add possible troubleshooting steps.

Event Timeline

There is a toolschecker tool that has the following crontab:

*/5 * * * *  /usr/bin/jsub -N cron-tools.toolschecker-1 -once -quiet touch /data/project/toolschecker/crontest.txt

toolschecker.py has this about it:

@check('/toolscron')
def cron_check():
    ''' A tools cron job touches a file every five minutes.  This test verifies
        that the mtime is appropriately recent.'''
    filepath = '/data/project/toolschecker/crontest.txt'
    tenminutes = 60 * 10
    mtime = os.path.getmtime(filepath)
    if time.time() - mtime < tenminutes:
        return True
    return False

So this seems to be checking if:

  1. tools-cron-* is working
  2. new jobs can be submitted every 5 minutes
  3. jobs can write to a NFS-mounted filesystem
  4. delays / time drift is not greater than 5 minutes

This tool also has a broken ~/public_html/index.php file that says "Lighttpd works".

GTirloni triaged this task as Medium priority.