Page MenuHomePhabricator

job queue monitoring looks for 1.18 dir and fails / $wmfExtendedVersionNumber.php
Open, LowPublic

Description

currently all the "check_job_queue" checks on Nagios fail with:

JOBQUEUE CRITICAL - check plugin (check_job_queue) or PHP errors -

investigating this i saw the problem does not appear to be in "check_job_queue" itself, but rather in CommonSettings.php , as check_job_queue misses this:

PHP Warning: require(/home/wikipedia/common/php-1.18/../wmf-config/ExtensionMessages-1.18.php): failed to open stream: No such file or directory in /home/wikipedia/common/wmf-config/CommonSettings.php on line 2506

and ..

PHP Fatal error: require(): Failed opening required '/home/wikipedia/common/php-1.18/../wmf-config/ExtensionMessages-1.18.php' (include_path='/home/wikipedia/common/php-1.20wmf2/extensions/OggHandler/PEAR/File_Ogg:/home/wikipedia/common/php-1.18:/home/wikipedia/common/php-1.18/lib:/usr/local/lib/php:/usr/share/php') in /home/wikipedia/common/wmf-config/CommonSettings.php on line 2506

that line 2506 in CommonSettings.php is:

require( "$wmfConfigDir/ExtensionMessages-$wmfExtendedVersionNumber.php" );

so it is looking in /php-1.18/ because $wmfExtendedVersionNumber.php is set to that, and that setting seems outdated.

Where should it be fixed?


Version: unspecified
Severity: normal

Details

Reference
bz36835

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:23 AM
bzimport set Reference to bz36835.
bzimport added a subscriber: Unknown Object (MLST).

15:37 < jeremyb> do you have anything in /home/wikipedia/common/wikiversion*

?

15:37 < mutante> where should it get the info from?
15:38 < mutante> yea, wikiversion.data
15:38 < mutante> .dat
15:38 < mutante> 2012-05-09
15:39 < mutante> the string "18" does not appear in the file
15:40 < mutante> and wikiversions.cdb , modified 05-10
15:41 < jeremyb> so, strace and find out which wikiversions file it's using?

or if it's using one at all?

15:41 < jeremyb> 1.18 was once hardcoded into CommonSettings.php as a

fallback. but not in the current cluster version so I'm 
looking elsewhere

15:43 < mutante> open("/usr/local/apache/common-local/wikiversions.cdb",

O_RDONLY) = 3

15:43 < jeremyb> there you go

15:44 < jeremyb> does that (or it's .dat) have 1.18?
15:45 < mutante> yes
15:45 < mutante> so "getMWVersion" should be changed to use /home ?
15:46 < mutante> or add mechanism to copy to /usr/local
15:46 < jeremyb> or -local should be made to be reliably up to date

13:48 mutante: copying outdated wikiversions.dat/.cdb files from /home to /usr/local on spence, which fixes check_job_queue (thanks jeremyb)

./check_job_queue JOBQUEUE OK - all job queues below 10,000

You should probably use the /usr/local/apache/common/php/maintenance/showJobs.php in some way or another.

We could just push all the MW files to spence...

Daniel: Is this still an issue, or can this be closed as obsolete?

Daniel: Is this still an issue, or can this be closed as obsolete?

what i wrote in 2012 is not an issue anymore. since then we switched to a single job_queue check. Which was ok at some point but maybe it is not ok again, because:

Current Status:

OK

(for 72d 17h 59m 48s)
Status Information: Could not open input file: /home/wikipedia/common/multiversion/MWScript.php
JOBQUEUE OK - all job queues below 10,000

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=neon&service=check_job_queue

#!/bin/bash

nagios plugin to check the mediawiki job queue

LARGEQUEUES=
while read wiki count
do

if [ ! $(echo "$count" | grep -E "^[0-9]+$") ]; then
        echo "JOBQUEUE CRITICAL - check plugin (`basename $0`) or PHP errors - $wiki"
        exit 2
elif [ $count -gt 9999 ]; then
        LARGEQUEUES="$LARGEQUEUES, $wiki ($count)"
fi
  1. The line below is a bash-ism that's needed for the LARGEQUEUES variable above to be in the right scope
  2. If you do php ... | while read wiki count; do LARGEQUEUE=blah; done , then the LARGEQUEUE variable will
  3. be manipulated in a subshell and the changes won't be visible to the if check below

done < <( php /home/wikipedia/common/multiversion/MWScript.php extensions/WikimediaMaintenance/getJobQueueLengths.php )
if [ -z "$LARGEQUEUES" ]; then

echo "JOBQUEUE OK - all job queues below 10,000"
exit 0

else

echo "JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: $LARGEQUEUES"
exit 2

fi

root@neon:/usr/lib/nagios/plugins# ./check_job_queue
Could not open input file: /home/wikipedia/common/multiversion/MWScript.php
JOBQUEUE OK - all job queues below 10,000

root@neon:~# cd /h/w/
-bash: cd: /h/w/: No such file or directory

of course, neon does not have /h/w. spence did. this could never work if it relies on that

@Dzahn: I guess it is no more an issue, isn’t it? I let you close in this case.