Page MenuHomePhabricator

[Story] Dispatching via delayed jobs (instead of cron script)
Open, LowPublic

Description

Redis based job queues support delayed job execution. Implement dispatching based on this instead of relying on a cron job.

Idea:

  • Every edit schedules a delayed DispatchTriggerJob. That job is completely generic and holds no info at all, so DispatchTriggerJob of this kind are the same. This means that new jobs get ignored if there is already an older job waiting for execution.
  • (option a) DispatchTriggerJob would poll the changes table, as we do now, and dispatch any pending changes to the most lagged wiki(s). This means that passes for long tail wikis will often end up doing nothing. If "doing nothing" is quick enough, we could simply go and look at the next wiki, until some minimum number of changes has been processed, or some maximum time has been exceeded.
  • (option b) DispatchTriggerJob would take the next batch of changes, and send notifications for all of them to the interested wikis. That means that each pass has to (potentially) push to all wikis, which may take quite long.
  • If there are still pending jobs or wikis to service, DispatchTriggerJob schedules another (delayed?) DispatchTriggerJob before it exists. How many new triggeres should be scheduled? We need to avoid starvation, but also prevent explosive growth of the number of trigger jobs.

T48476: Running dispatchChanges as cronjob doesn't close down as expected
T47892: Make Wikidata changes appear quicker in the watchlist on the client

Details

Reference
bz46643

Related Objects

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:25 AM
bzimport set Reference to bz46643.
bzimport added a subscriber: Unknown Object (MLST).
daniel created this task.Mar 28 2013, 5:15 PM
  • Bug 52803 has been marked as a duplicate of this bug. ***
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).Dec 1 2014, 2:31 PM
matej_suchanek set Security to None.
JanZerebecki renamed this task from Dispatching via delayed jobs to Dispatching via delayed jobs (instead of cron script).Jul 13 2015, 2:10 PM
JanZerebecki added a subscriber: Tobi_WMDE_SW.
hoo added a comment.Aug 9 2015, 1:49 PM

Do we want one job per edit or how exactly is this supposed to look? Wrapping the current dispatching mechanism in jobs doesn't really sound like a good idea to me.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 9 2015, 1:49 PM

I'll edit the description to give more details.

daniel updated the task description. (Show Details)Aug 10 2015, 4:56 PM
Jonas renamed this task from Dispatching via delayed jobs (instead of cron script) to [Task] Dispatching via delayed jobs (instead of cron script).Aug 13 2015, 2:41 PM
daniel renamed this task from [Task] Dispatching via delayed jobs (instead of cron script) to [Story] Dispatching via delayed jobs (instead of cron script).Sep 3 2015, 2:17 PM
daniel updated the task description. (Show Details)Sep 3 2015, 2:26 PM
Addshore lowered the priority of this task from High to Low.Jan 23 2019, 1:14 PM
Addshore added a subscriber: Addshore.

Change dispatching is currently very fast

I sorta copy what I said in T193733#5276659 on reasons

  • It's a SPOF, if mwmaint1002 node goes down for HW issues, we can't dispatch at all. If there's a need to restart the node, dispatching has to stop until it's done.
  • "Noisy neighbor" effect, people run maintenance scripts in the mwmaint node, it can be choked to death by other scripts and it can make running maintenance scripts impossible by having bugs that eats all of the resources.
  • The distributed system we designed for this (pulling the wikis using three cronjobs, dispatching and picking up basically random + most stalled ones). This can use the great infrastructure for jobqueues we have.
  • Cronjobs are hard to debug, moving them to jobqueue makes it easier to debug in logstash.

By reducing number of edits happening on wikidata (using one big wbeditentity API call instead of several when termbox v1 edit happens) can help, but there might be better ways to do it. @Joe has lots of good insight in this regard.