Page MenuHomePhabricator

Wikidata change propagation: set reduce dispatcher randomness and shorten dispatch interval
Closed, DeclinedPublic

Description

Every now and then, we see a big spike in dispatch lag, usually for one or two specific wikis. To mitigate this problem and allow the backlog to be dealt with quickly, we should tweak the parameters of the dispatchChanges script to be more suitable for a high volume repo with many clients. particularly:

  • reduce --randomness, perhaps to 3. With the default value of 10, the lagged wiki has a 1/10 chance to be picked for a given dispatcher run.
  • reduce --dispatch-interval to 5. The default of 60 means that the lagged wiki will receive at most one batch of changes per 60 seconds (no matter how many dispatchers we are running).

Note: reducing --batch-interval may lead to increased overhead for smaller wikis. See T179008 for an alternative approach.

Event Timeline

thiemowmde subscribed.
  • This should be linked to a parent ticket, please.
  • Where is the code the tasks description talks about?
  • What are the acceptance criteria for this ticket? What are the current numbers, and how do you expect these numbers to change when we will consider this ticket done?

So, None of the patches were linked to this ticket but....

Set to 5: https://gerrit.wikimedia.org/r/#/c/387282/
Lag increased, tried setting to 10: https://gerrit.wikimedia.org/r/#/c/395512/
Lag increased so reset to 15: https://gerrit.wikimedia.org/r/#/c/395525/

I have many things that I want to try and write about this and will do in the next days.

I have many things that I want to try and write about this and will do in the next days.

Any updates? Can we close this?

So daniel and I discussed many things a while back.

I have a chain of patches that will make moving on this easier.

Although that focuses toward migrating to a new / our own lock manager.
These will also allow us to alter the config for the dispatching without having to alter the cron / have a merge in puppet.

Addshore changed the task status from Open to Stalled.Jun 25 2018, 9:55 AM
Addshore removed Addshore as the assignee of this task.

Marking as stalled for now.
The config is now in mediawiki-config.
This should probably be picked up by the campsite at some point.

Everything seems to be working just fine right now