Page MenuHomePhabricator

Support per-db-shard concurrency in ChangeProp
Closed, ResolvedPublic

Description

The parent task is about spiky connections to MySQL and the current theory is that the reason for that is the fact that ChangeProp only supports global concurrency limits, so when a big batch of jobs for a wiki from a particular shard comes, all the global concurrency is allocated to this particular DB shard and for a single shard it's too much - thus the 'smoothing' that concurrency limiting provides globally doesn't help on the database level.

In order to fix that, we need per-db-shard concurrency. In order to do that, we probably need to partition the topics where we need it by db shard, thus we need to solve T157822 first.

Other issues is that we need to create a custom partitioner that will be aware of the mediawiki-config dbname-shard naming, preferably without copy-pasting the shards mapping into the Event-Platform repo.

Last, ChangeProp should support per-partition concurrencies and (since partition names are just numbers in kafka) we need to integrate the db-partition mapper into change-prop as well somehow.

Details

Related Gerrit Patches:
mediawiki/services/change-propagation/jobqueue-deploy : masterMake a special rule for refreshLinks partitioned execution.

Event Timeline

Pchelolo triaged this task as High priority.Mar 14 2018, 8:31 PM
Pchelolo created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 14 2018, 8:31 PM
Ottomata moved this task from Incoming to Blocked on the Analytics board.Mar 15 2018, 4:27 PM
Ottomata moved this task from Blocked to Radar on the Analytics board.

Change 420841 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Make a special rule for refreshLinks partitioned execution.

https://gerrit.wikimedia.org/r/420841

Change 420841 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Make a special rule for refreshLinks partitioned execution.

https://gerrit.wikimedia.org/r/420841

Mentioned in SAL (#wikimedia-operations) [2018-03-21T15:48:54Z] <ppchelko@tin> Started deploy [cpjobqueue/deploy@0dcdc82]: Partition the refreshLinks topic by DB shard T189738

Mentioned in SAL (#wikimedia-operations) [2018-03-21T15:51:56Z] <ppchelko@tin> Finished deploy [cpjobqueue/deploy@0dcdc82]: Partition the refreshLinks topic by DB shard T189738 (duration: 03m 03s)

Mentioned in SAL (#wikimedia-operations) [2018-03-21T15:53:16Z] <ppchelko@tin> Started deploy [cpjobqueue/deploy@b291728]: Partition the refreshLinks topic by DB shard T189738 take 2

Mentioned in SAL (#wikimedia-operations) [2018-03-21T15:53:56Z] <ppchelko@tin> Finished deploy [cpjobqueue/deploy@b291728]: Partition the refreshLinks topic by DB shard T189738 take 2 (duration: 00m 40s)

Pchelolo closed this task as Resolved.Mar 21 2018, 6:24 PM
Pchelolo claimed this task.

Deployed. Seem to be working fine, resolving.