Page MenuHomePhabricator

Provide user_slot resource in grid
Closed, ResolvedPublic

Description

Hasteur asked on IRC for a way to limit some jobs to five in parallel, while keeping "room for random pitch jobs" which wouldn't be available if he just let the grid handle the scheduling because that would use all 16 (?) slots.

Toolserver has a user_slot resource (cf. https://wiki.toolserver.org/view/Job_scheduling#Optional_resources):

* user_slot=1
This resource is limit to 10 slots for each user. It has no specific meaning and can be used for limiting the number of job that are executed in parallel by a single user.
E.g. if you have different scripts that all edit wiki pages and you would like to have them run sequential, so that only one job runs at the same time, you can request -l user_slot=10 for each job. If one job is running it consumes all available ten user_slots and all other job requesting this resources are queued until the first job has finished.

That would be useful in Tools as well. For better granularity however, I propose we choose 100 or 1000 instead.


Version: unspecified
Severity: normal

Details

Reference
bz52976

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 1:53 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz52976.
scfc created this task.Aug 18 2013, 3:13 AM
scfc added a comment.Aug 18 2013, 6:35 PM

Second thought: If resource definitions are cheap, we should offer a bunch of them, i. e. user_slot_{A..Z}.

coren added a comment.Oct 10 2013, 5:02 PM

They're cheap, and I can think of no reason to not offer a couple (though I think 26 is overkill).

I'll put half at 60 (nicely divisible by 2, 3, 4, 5, 6, 10, 12, 15) and half at 64 (power-of-two) so most use cases

Merl added a comment.Jun 5 2014, 9:48 AM

I need to limit the number of some unrelated sge jobs running in paralell. For example if too much scripts editing in mediawiki are running i could hit a ratelimit (technical limited or per local rule for bots).

For this reason on TS i addded a resource quota set called "userslots". Every users have 10 slots available globally. So scripts requesting resource "-l user_slot=10" were executed sequential or with "-l user_slot=5" only two jobs can run in paralell and so on. But another maximum number would be ok for me, too.

For documentation this is the current config on TS:
$qconf -srqs userslots
{

name         userslots
description  Limit users to 10 "user" slots
enabled      TRUE
limit        users {*} hosts * to user_slot=10

}

$qconf -sc
#name shortcut type relop requestable consumable default urgency
#----------------------------------------------------------------------
user_slot user_slot INT <= YES YES 0 0

$qconf -se global
hostname global
complex_values user_slot=2048

The last value (2048) must be only greater than 10 * "maximum slots per host" (which is currently 100 on labs), so that this host conf limit in never hit.

silke added a comment.Jun 10 2014, 9:36 AM

Hi Marc-André, not a blocker, but can you please make this happen for merl nevertheless?
Best, Silke

coren added a comment.Jul 9 2014, 1:51 PM

I went with 60 because that's a very divisible integer. The complex is named 'user_slot' with shortcut 'u'.