Page MenuHomePhabricator

jobs-loop sometime can not find PHP script
Closed, DeclinedPublic

Description

Seen on job-runner03 :

Main loop:
/bin/bash /usr/local/apache/common/php/extensions/WikimediaMaintenance/jobs-loop.sh

A child:
\_ php MWScript.php runJobs.php --wiki=The MediaWiki script file "./php-trunk/maintenance/nextJobDB.php" does not exist. --


Version: unspecified
Severity: normal

Details

Reference
bz37071

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:30 AM
bzimport set Reference to bz37071.
bzimport added a subscriber: Unknown Object (MLST).

$ ll /usr/local/apache/
ls: /usr/local/apache/: Input/output error

Sounds bad ;-D

Looking at the process information for jobs-loop.sh , I found out that the cwd pointed to a deleted path:

$ ls -l /proc/1234/cwd
lrwxrwxrwx 1 apache apache 0 2012-05-25 08:16 cwd -> /usr/local/apache/common-local/multiversion (deleted)

Although the directory is actually there :-(

Restarting loop ( /etc/init.d/mw-job-runner ), seems to fix the link:

ls -l /proc/6973/cwd

lrwxrwxrwx 1 apache apache 0 2012-05-25 08:24 /proc/6973/cwd -> /usr/local/apache/common-local/multiversion/

/usr/local/apache being a NFS mount :

deployment-nfs-memc:/mnt/export/apache on /usr/local/apache type nfs (rw,bg,soft,tcp,timeo=14,intr,nfsvers=3,addr=10.4.0.58)

I have no idea what could make it unliked. Maybe the NFS server move the directory somehow or whenever NFS has a connection issue the jobrunner servers considers the file unaccessible permanently.

I am marking 36646 - "get rid of NFS" as a dependency.

Are you sure you haven't deleted and recreated the directory since the process was started? If yes & it happens again, don't restart the process and notify me, I'd like to have a look.

Lowering priority, I have not seen that occurrence I guess. Most probably someone renamed, altered the path.

I guess we can close the bug if it does not occur anymore over then next week or so.

Was some transient issue I have not reproduced seen reproduced so far. So I am just closing this bug and will reopen it later on if it occurs again.