Page MenuHomePhabricator

Explain/Investigate low number of giftbot queue jobs
Closed, ResolvedPublic

Description

Every fortnight, tools.giftbot runs an array job containing 200 parallel tasks on the giftbot queue. I know that the single jobs are accepted to the queue one by one until all 200 are running. But I have observed for quite a time that they don't all run at the same time (circa 85 run, 115 are piled up waiting). Has something changed in the config? Is it possibly something on my end? Can I have the old capacity back? Or do I have to settle for less?

Event Timeline

Restricted Application added a project: Cloud-Services. · View Herald TranscriptJan 19 2017, 11:00 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
scfc triaged this task as Low priority.Feb 16 2017, 11:21 PM
scfc added a subscriber: scfc.

I don't remember any changes in the configuration, but looking just now at qstat -j 1227342:

[…]
scheduling info:            queue instance "webgrid-lighttpd@tools-webgrid-lighttpd-1403.eqiad.wmflabs" dropped because it is temporarily not available
                            queue instance "giftbot@tools-exec-gift.eqiad.wmflabs" dropped because it is overloaded: np_load_avg=7.430000 (= 7.430000 + 0.50 * 0.000000 with nproc=2) >= 2.00
                            queue instance "webgrid-lighttpd@tools-webgrid-lighttpd-1201.eqiad.wmflabs" dropped because it is disabled
                            cannot run in queue "webgrid-lighttpd" because it is not contained in its hard queue list (-q)
                            cannot run in queue "mailq" because it is not contained in its hard queue list (-q)
                            cannot run in queue "task" because it is not contained in its hard queue list (-q)
                            cannot run in queue "continuous" because it is not contained in its hard queue list (-q)
                            cannot run in queue "webgrid-generic" because it is not contained in its hard queue list (-q)

and the load on tools-exec-gift is > 10, so why should the grid run more jobs if the node already is saturated?

scfc moved this task from Triage to Backlog on the Toolforge board.Feb 16 2017, 11:21 PM
Giftpflanze closed this task as Resolved.May 14 2018, 1:08 AM
Giftpflanze claimed this task.