There are several cases when some event (an edit to a very popular template, for instance) triggers a huge amount of jobs to be enqueued for one specific type and one wiki. In those cases, with the current jobqueue we're completely unable to react to such events raising the number of workers for that specific job type/wiki.
We want the new transport to be smarter, and in fact I know changeprop has better handles already for concurrency. What I would like to have is the ability to change concurrency quickly to react to some event, thus without going through the cycle of puppet patch/review/merge/apply/restart we need to go through with the current jobrunner (which isn't able to raise the concurrency for a specific wiki either).
So ideally operations folks would like to have what follows:
- We have a global concurrency for all of changeprop requests. Be it truly global (across the cluster) or local to a specific instance. This will allow us to fine-tune the number of running jobs on either side - have the same number of requests as the number of hhvm workers we globally dedicate to this duty.
- Each job type *can* have a weight, with the default being a weight of 1. Each job i will then have a maximum concurrency of max((w_i/sum(w_j)*global concurrency, 1)
- We should be able to change the weight of a job type without a code review or a full restart of the service
- Ideally, we should be able to modify the weight for a specific wiki too, but this is just a nice to have in my opinion.
To this aim, given service-runner has the ability to reload config files, it could be enough to have a dedicated file including the concurrency setting generated from etcd via confd, and then send a signal to changeprop in order for it to re-read the configs.