Page MenuHomePhabricator

GWT freezing up - queues appear to be getting stuck
Closed, ResolvedPublic

Description

I have previously noticed my longer running uploads using GWT may have an hour where no uploads were happening, and put this down to operational queue priorities. In the last day upload queues appear to be not re-scheduling but have been on indefinite wait, this is not due to upload failures as the queues can be forces to temporarily restart when a new upload tranche is added.

As it is not possible as an end user to see what is going on with queues, users will automatically assume that uploads have either completed or fallen over. Gaps of several hours or days are likely to result in duplicate upload runs, or (as I have done here) just having more queues added by accident without waiting for the last to actually complete.

History from my uploads for the Historic American Buildings Survey project ( see http://commons.wikimedia.org/w/index.php?title=Special%3AListFiles&limit=250&user=F%C3%A6 ):

06:26, 23 July 2014 - 10:33, 23 July 2014 (4 hour gap)

Temporarily restarting with a new queue was added.

10:34, 23 July 2014 - 15:00, 23 July 2014 (4.5 hour gap)

Temporarily restarting with a new queue (1801:2000).

16:36, 23 July 2014 - 06:22, 24 July 2014

Another temporary restart when adding (2001:2300), however files from the last two upload tranches can be seen at this restart time.

06:23, 24 July 2014 - [Current time, 13:02 in UK, 6.5 hours and waiting]

There should be around 15,000 images on queue at this point, so I'm avoiding sending off another tranche until we know what is happening.

Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=68285

Details

Reference
bz68506

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:29 AM
bzimport set Reference to bz68506.
bzimport added a subscriber: Unknown Object (MLST).

Liam said:

Well that's particularly unfortunate timing as I understand that the GWT is
going to be a feature item in the next edition of the Wikipedia Signpost, which
will be published very soon...

At least some superficial debugging to clarify whether it's working seems rather urgent, to know what to tell users.

The runJobs logs show lots of successful jobs by Fae, with some gwtoolsetUploadMediafileJob jobs failing with "An identical media file already exists under the title" or verification errors.

I'm not seeing any gwt jobs killed by the job runner timeout either (looking around some of the local log files). I'm not seeing anything in the fatal nor exception logs.

showJobs.php shows:
gwtoolsetUploadMediafileJob: 1 queued; 485 claimed (0 active, 485 abandoned); 0 delayed

Todays runJobs failure are:
2014-07-25 12:39:03 mw1014 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d24fd80fd25 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=14735 error=GWToolset\Jobs\UploadMediafileJob::run: <An identical media file already exists under the title "File:Lock 4 in 1935 Chesapeake and Ohio Canal from HABS.tif".>
2014-07-25 12:39:03 mw1003 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d24fd81246c options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=15400 error=GWToolset\Jobs\UploadMediafileJob::run: <An identical media file already exists under the title "File:South elevation corcoran art gallery.tif".>
2014-07-25 12:39:03 mw1004 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d24fd8110e8 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=15483 error=GWToolset\Jobs\UploadMediafileJob::run: <An identical media file already exists under the title "File:View from the s.w. corcoran art gallery.tif".>
2014-07-25 12:39:04 mw1015 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d24fd814b59 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=15739 error=GWToolset\Jobs\UploadMediafileJob::run: <An identical media file already exists under the title "File:Detail s.w. corner corcoran art gallery.tif".>
2014-07-25 12:39:04 mw1009 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d24fd8137f3 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=16090 error=GWToolset\Jobs\UploadMediafileJob::run: <An identical media file already exists under the title "File:West, 17th street elevation corcoran art gallery.tif".>
2014-07-25 13:39:38 mw1001 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d24fd8137f3 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=13952 error=GWToolset\Jobs\UploadMediafileJob::run: <An identical media file already exists under the title "File:West, 17th street elevation corcoran art gallery.tif".>
2014-07-25 13:39:38 mw1011 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d24fd81246c options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=14350 error=GWToolset\Jobs\UploadMediafileJob::run: <An identical media file already exists under the title "File:South elevation corcoran art gallery.tif".>
2014-07-25 13:39:38 mw1006 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d24fd8110e8 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=14613 error=GWToolset\Jobs\UploadMediafileJob::run: <An identical media file already exists under the title "File:View from the s.w. corcoran art gallery.tif".>
2014-07-25 13:39:39 mw1013 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d24fd814b59 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=15130 error=GWToolset\Jobs\UploadMediafileJob::run: <An identical media file already exists under the title "File:Detail s.w. corner corcoran art gallery.tif".>
2014-07-25 13:39:39 mw1012 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d24fd80fd25 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=15213 error=GWToolset\Jobs\UploadMediafileJob::run: <An identical media file already exists under the title "File:Lock 4 in 1935 Chesapeake and Ohio Canal from HABS.tif".>
2014-07-25 13:54:33 mw1009 commonswiki: gwtoolsetUploadMediafileJob User:Ayaita/GWToolset/Mediafile_Batch_Job/53d2619982a1b options=array(3) whitelisted-post=array(31) user-name=Ayaita user-options=array(26) t=271 error=GWToolset\Jobs\UploadMediafileJob::run: The media file URL could not be evaluated. The URL delivers the content in a way that is not yet handled by this extension or there was an HTTP request issue. URL given was "<a rel="nofollow" class="external free" href="http://mochila_images.s3.amazonaws.com/haciendo_canoa.jpg">http://mochila_images.s3.amazonaws.com/haciendo_canoa.jpg</a>". HTTP request error "There was a problem during the HTTP request: 403 Forbidden".Array
2014-07-25 13:54:48 mw1005 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d2619923bcc options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=14537 error=GWToolset\Jobs\UploadMediafileJob::run: &lt;An identical media file already exists under the title &amp;quot;File:Elevation of addition of north elevation corcoran art gallery.tif&amp;quot;.&gt;
2014-07-25 13:54:48 mw1009 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d26199261f1 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=14607 error=GWToolset\Jobs\UploadMediafileJob::run: &lt;An identical media file already exists under the title &amp;quot;File:2nd floor window detail corcoran art gallery.tif&amp;quot;.&gt;
2014-07-25 13:54:48 mw1013 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d2619922882 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=15187 error=GWToolset\Jobs\UploadMediafileJob::run: &lt;An identical media file already exists under the title &amp;quot;File:N.w.corner corcoran art gallery.tif&amp;quot;.&gt;
2014-07-25 13:54:48 mw1015 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d261992753c options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=15048 error=GWToolset\Jobs\UploadMediafileJob::run: &lt;An identical media file already exists under the title &amp;quot;File:Corner pavillion corcoran art gallery.tif&amp;quot;.&gt;
2014-07-25 13:54:49 mw1012 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d2619924eb7 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=15599 error=GWToolset\Jobs\UploadMediafileJob::run: &lt;An identical media file already exists under the title &amp;quot;File:Palladian window south elevation corcoran art gallery.tif&amp;quot;.&gt;
2014-07-25 14:54:51 mw1005 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d261992753c options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=13970 error=GWToolset\Jobs\UploadMediafileJob::run: &lt;An identical media file already exists under the title &amp;quot;File:Corner pavillion corcoran art gallery.tif&amp;quot;.&gt;
2014-07-25 14:54:51 mw1003 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d2619922882 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=13863 error=GWToolset\Jobs\UploadMediafileJob::run: &lt;An identical media file already exists under the title &amp;quot;File:N.w.corner corcoran art gallery.tif&amp;quot;.&gt;
2014-07-25 14:54:51 mw1012 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d2619923bcc options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=14061 error=GWToolset\Jobs\UploadMediafileJob::run: &lt;An identical media file already exists under the title &amp;quot;File:Elevation of addition of north elevation corcoran art gallery.tif&amp;quot;.&gt;
2014-07-25 14:54:52 mw1008 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d2619924eb7 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=14094 error=GWToolset\Jobs\UploadMediafileJob::run: &lt;An identical media file already exists under the title &amp;quot;File:Palladian window south elevation corcoran art gallery.tif&amp;quot;.&gt;
2014-07-25 14:54:53 mw1007 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53d26199261f1 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=15065 error=GWToolset\Jobs\UploadMediafileJob::run: &lt;An identical media file already exists under the title &amp;quot;File:2nd floor window detail corcoran art gallery.tif&amp;quot;.&gt;

Also see:
zgrep gwtool mw-log/archive/runJobs.log-20140721.gz | grep 'error='
2014-07-20 11:36:02 mw1004 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cba95507fec options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=45531 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 11:46:02 mw1001 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbabae0f950 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=44379 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 11:56:03 mw1013 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbae08aaca3 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=36991 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 12:16:01 mw1014 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbb2bc9856d options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=32303 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 12:16:03 mw1015 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbb2bc97ea6 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=34837 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 12:26:01 mw1007 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbb517e0b68 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=33080 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 12:26:03 mw1005 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbb517e22e0 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=33321 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:06:01 mw1007 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbbe7eb2579 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=41850 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:06:01 mw1010 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbbe7eb3d17 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=40126 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:16:01 mw1016 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbc0d9247e2 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=38841 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:16:03 mw1014 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbc0d926797 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=39175 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:16:03 mw1010 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbc0d9213b2 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=43421 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:26:01 mw1012 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbc3329489f options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=36920 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:26:01 mw1003 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbc33293f21 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=37341 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:36:01 mw1003 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbc58d10a4c options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=36552 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:36:04 mw1002 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbc58d0e850 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=40633 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:36:04 mw1011 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbc589807c5 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=38778 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:46:01 mw1003 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbc7e63d5c8 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=32513 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 13:46:03 mw1011 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cbc7e2b0daa options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=37831 error=GWToolset\Jobs\UploadMediafileJob::run: This file did not pass file verification.
2014-07-20 21:28:31 mw1007 commonswiki: gwtoolsetUploadMediafileJob User:Fæ/GWToolset/Mediafile_Batch_Job/53cc34597bed3 options=array(3) whitelisted-post=array(45) user-name=Fæ user-options=array(25) t=18713 error=GWToolset\Jobs\UploadMediafileJob::run: &lt;An unknown error occurred in storage backend &quot;local-swift-eqiad&quot;.

Aside from the failed "claimed" jobs, the queue is empty atm.

(In reply to Aaron Schulz from comment #3)

Aside from the failed "claimed" jobs, the queue is empty atm.

In which case I don't understand how "ghost jobs" can be arising. There may be a deeper problem here.

For example, https://commons.wikimedia.org/wiki/File:-_Mumma_Farm,_House,_Smoketown_Road,_Sharpsburg,_Washington_County,_MD_HABS_MD,22-SHARP.V,30A-36.tif was uploaded at 14:56, 25 July 2014, but the job (named "HABS 21 July 2014 (1601:1800)") has halted several times, only re-started when I set up a new queue. It does one individual batch of files in the same minute or two, and then halts again, indefinitely. The date (21 July) in the upload edit comment is the date the job was originally committed to GWT.

These jobs must still be on the queue as they are "resurrecting" and continuing to upload each time. Further, I cannot seem to start any new runs, they just behave in the same way.

Nothing significant has changed in my XML generation, since the first runs, which have successfully uploaded over 130,000 files so far.

Other "jobs" that appear to be non-visible in the queue are commented as:
HABS 24 July 2014 (2001:2300)
HABS 23 July 2014 (1801:2000)

I can of course, just carry on creating more jobs, but as these are around 7,000 images per job, and I can only get 20 to upload from each job by starting a new job, this looks broken.

I have created another job, "HABS 24 July 2014 (2301:2600)" (should have been 25th, it's a typo).

This re-kicked the other jobs, as can be seen as further sets of 20 images each were immediately uploaded from those past jobs (which otherwise appear to have been cancelled, or at least were all indefinitely halted last at 14:56, 25 July 2014 [UK time]):

  • HABS 23 July 2014 (1801:2000)
  • HABS 21 July 2014 (1601:1800)
  • HABS 24 July 2014 (2001:2300)

As with the past jobs, "HABS 24 July 2014 (2301:2600)" uploaded 20 images and then appears to have indefinitely halted at 19:27, 25 July 2014 [UK time]; though only 20 minutes have passed since writing this, so worth double checking in a few hours.

Each of these jobs include several thousand images, so this is looking like a rather large backlog.

(In reply to Fæ from comment #4)

(In reply to Aaron Schulz from comment #3)

Aside from the failed "claimed" jobs, the queue is empty atm.

In which case I don't understand how "ghost jobs" can be arising. There may
be a deeper problem here.

I should not that failed claimed jobs can be retried after an hour or so (up to 3 total tries, including the first). Nevertheless, that isn't the issue here.

However, from the description above it seems like the old jobs never come back until new ones are added (e.g. hours pass with nothing happening otherwise). Even right now I see:

aaron@terbium:~$ mwscript showJobs.php commonswiki --group | grep gwtoolset
gwtoolsetUploadMediafileJob: 0 queued; 138 claimed (0 active, 138 abandoned); 0 delayed
gwtoolsetUploadMetadataJob: 8 queued; 0 claimed (0 active, 0 abandoned); 0 delayed

aaron@terbium:~$ mwscript showJobs.php commonswiki --list | grep gwtoolset
gwtoolsetUploadMetadataJob User:Fæ/GWToolset/Metadata_Batch_Job/53d2a118b0f90 attempts=1 user-name=Fæ whitelisted-post=array(45) jobReleaseTimestamp=1406312788 status=unclaimed
gwtoolsetUploadMetadataJob User:Fæ/GWToolset/Metadata_Batch_Job/53d2a1152a052 attempts=1 user-name=Fæ whitelisted-post=array(45) jobReleaseTimestamp=1406312785 status=unclaimed
gwtoolsetUploadMetadataJob User:Fæ/GWToolset/Metadata_Batch_Job/53d2a1154d9b3 attempts=1 user-name=Fæ whitelisted-post=array(45) jobReleaseTimestamp=1406312785 status=unclaimed
gwtoolsetUploadMetadataJob User:Ayaita/GWToolset/Metadata_Batch_Job/53d2a113495f1 attempts=1 user-name=Ayaita whitelisted-post=array(31) jobReleaseTimestamp=1406312783 status=unclaimed
gwtoolsetUploadMetadataJob User:Fæ/GWToolset/Metadata_Batch_Job/53d2a11345628 attempts=1 user-name=Fæ whitelisted-post=array(45) jobReleaseTimestamp=1406312783 status=unclaimed
gwtoolsetUploadMetadataJob User:Fæ/GWToolset/Metadata_Batch_Job/53d2a11654ce5 attempts=1 user-name=Fæ whitelisted-post=array(45) jobReleaseTimestamp=1406312786 status=unclaimed
gwtoolsetUploadMetadataJob User:Fæ/GWToolset/Metadata_Batch_Job/53d2a11316624 attempts=1 user-name=Fæ whitelisted-post=array(45) jobReleaseTimestamp=1406312783 status=unclaimed
gwtoolsetUploadMetadataJob User:Fæ/GWToolset/Metadata_Batch_Job/53d2a113296a5 attempts=1 user-name=Fæ whitelisted-post=array(45) jobReleaseTimestamp=1406312783 status=unclaimed

So there are 8 jobs waiting to be claimed though nothing claimed them. This is odd since we have a dedicated runner for gwt jobs on each of 17 servers and none of them are claimed. This could be possible if they are not in JobQueueAggregator (making the runners be unaware of the queue's readiness). Checking that gives:

aaron@terbium:~$ mwscript eval.php testwiki

print_r( JobQueueAggregator::singleton()->getAllReadyWikiQueues() );

Array
(

[ParsoidCacheUpdateJobOnEdit] => Array
    (
        [0] => fawiki
        [1] => warwiki
        [2] => wikidatawiki
        [3] => ruwiki
        [4] => hewiki
        [5] => eswiki
        [6] => arwiki
        [7] => frwiki
        [8] => plwiki
        [9] => mgwiktionary
        [10] => hywiki
        [11] => mediawikiwiki
        [12] => enwiktionary
        [13] => frwiktionary
        [14] => svwiki
        [15] => zhwiki
    )

[refreshLinks] => Array
    (
        [0] => itwiki
        [1] => frwiki
        [2] => commonswiki
        [3] => enwiki
        [4] => enwiktionary
        [5] => ruwiki
    )

[cirrusSearchLinksUpdatePrioritized] => Array
    (
        [0] => metawiki
        [1] => enwiktionary
        [2] => svwiki
        [3] => eswiki
        [4] => hewiki
        [5] => fawiki
        [6] => ruwiki
        [7] => arwiki
        [8] => hywiki
        [9] => frwiki
        [10] => plwiki
        [11] => warwiki
        [12] => wikidatawiki
        [13] => commonswiki
        [14] => dewiki
        [15] => itwiki
    )

[ParsoidCacheUpdateJobOnDependencyChange] => Array
    (
        [0] => eswiki
        [1] => warwiki
        [2] => cawiktionary
        [3] => wikidatawiki
        [4] => plwiki
        [5] => shwiktionary
        [6] => mediawikiwiki
        [7] => svwiki
        [8] => frwiki
        [9] => enwiktionary
        [10] => mgwiktionary
        [11] => hewiki
        [12] => itwiki
        [13] => frwiktionary
        [14] => arwiki
        [15] => nlwiki
        [16] => commonswiki
        [17] => fawiki
        [18] => enwiki
        [19] => ruwiki
        [20] => hywiki
        [21] => shwiki
        [22] => dewiki
    )

[cirrusSearchLinksUpdate] => Array
    (
        [0] => mediawikiwiki
        [1] => enwiki
        [2] => frwiki
        [3] => enwiktionary
        [4] => lawiki
        [5] => euwiki
        [6] => itwiki
        [7] => commonswiki
        [8] => ruwiki
    )

[cirrusSearchOtherIndex] => Array
    (
        [0] => enwiki
    )

[cirrusSearchLinksUpdateSecondary] => Array
    (
        [0] => wikidatawiki
        [1] => frwiktionary
        [2] => dewiki
        [3] => svwiki
        [4] => enwiki
        [5] => shwiki
        [6] => frwiki
        [7] => ptwiki
    )

[htmlCacheUpdate] => Array
    (
        [0] => warwiki
        [1] => shwiki
        [2] => dewiki
        [3] => enwiktionary
        [4] => frwiki
        [5] => ruwiki
        [6] => enwiki
    )

[enotifNotify] => Array
    (
        [0] => plwiki
        [1] => commonswiki
        [2] => dewiki
        [3] => eswiki
    )

[MessageUpdateJob] => Array
    (
        [0] => mediawikiwiki
    )

)

No entries for gwt jobs. Since the above jobs had jobReleaseTimestamp=<X>, they must have started off delayed and then became available. Maybe the aggregator wasn't notified then. I don't see any bugs in executeReadyPeriodicTasks() in redisJobRunner off hand, but I'll see if I can find anything. That would explain some of these problems.

Change 149547 had a related patch set uploaded by Aaron Schulz:
Removed use of cache in JobQueueFederated pop() method

https://gerrit.wikimedia.org/r/149547

Change 149583 had a related patch set uploaded by Aaron Schulz:
Removed use of cache in JobQueueFederated pop() method

https://gerrit.wikimedia.org/r/149583

ayaita17 wrote:

(In reply to Aaron Schulz from comment #2)

The runJobs logs show lots of successful jobs by Fae, with some
gwtoolsetUploadMediafileJob jobs failing with "An identical media file
already exists under the title" or verification errors.

I'm not seeing any gwt jobs killed by the job runner timeout either (looking
around some of the local log files). I'm not seeing anything in the fatal
nor exception logs.

showJobs.php shows:
gwtoolsetUploadMediafileJob: 1 queued; 485 claimed (0 active, 485
abandoned); 0 delayed

Todays runJobs failure are:

2014-07-25 13:54:33 mw1009 commonswiki: gwtoolsetUploadMediafileJob
User:Ayaita/GWToolset/Mediafile_Batch_Job/53d2619982a1b options=array(3)
whitelisted-post=array(31) user-name=Ayaita user-options=array(26) t=271
error=GWToolset\Jobs\UploadMediafileJob::run: The media file URL could not
be evaluated. The URL delivers the content in a way that is not yet handled
by this extension or there was an HTTP request issue. URL given was "<a
rel="nofollow" class="external free"
href="http://mochila_images.s3.amazonaws.com/haciendo_canoa.jpg">http://
mochila_images.s3.amazonaws.com/haciendo_canoa.jpg</a>". HTTP request error
"There was a problem during the HTTP request: 403 Forbidden".Array

Hi, I am user Ayaita and my batch stopped uploading pictures. I just saw my job here and was wondering what this failure means and what I should do to restart the job.

Change 149547 merged by Ori.livneh:
Removed use of cache in JobQueueFederated pop() method

https://gerrit.wikimedia.org/r/149547

Change 149583 merged by jenkins-bot:
Removed use of cache in JobQueueFederated pop() method

https://gerrit.wikimedia.org/r/149583

Change 149911 had a related patch set uploaded by Aaron Schulz:
Removed use of cache in JobQueueFederated pop() method

https://gerrit.wikimedia.org/r/149911

Change 149911 merged by jenkins-bot:
Removed use of cache in JobQueueFederated pop() method

https://gerrit.wikimedia.org/r/149911

I backported the changes and kicked the aggregator:

JobQueueAggregator::singleton()->notifyQueueNonEmpty( 'commonswiki', 'gwtoolsetUploadMediafileJob' );
JobQueueAggregator::singleton()->notifyQueueNonEmpty( 'commonswiki', 'gwtoolsetUploadMetadataJob' );

Gilles raised the priority of this task from High to Unbreak Now!.Dec 4 2014, 10:11 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to High.Dec 4 2014, 11:20 AM