Page MenuHomePhabricator

Partition the transclusions topic in ChangeProp
Closed, ResolvedPublic

Description

All the transclusion-related events are sent to the change-prop.transcludes.resource-change topic, and right now it has a around 800 events per second. One of the ChangeProp workers is always around 90% CPU usage, which means it's almost at it's limit since a worker can only use one CPU core.

Most likely this worker is the one doing Varnish purges on the transcodes topic - construction HTCP packets is pretty CPU-intence and it's not bound on any IO, but we need to verify that. A brutal way to verify would be to kill the worker and look at the graphs, but I'm not sure it's a good idea. A less invasive way would be to add some sampled logging with a worker pid.

We need to consider partitioning the transcludes topic and adding support for partitioned topics in ChangeProp. Support for partitioning will come handy implementing T157088 too. There should be a parameter for a rule whether to use 1 worker for all partitions, or to use a worker-per-partition since we only want one rule to respect partitioning.

Event Timeline

Pchelolo created this task.Feb 9 2017, 1:58 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 9 2017, 1:58 AM
Pchelolo closed this task as Declined.Jul 31 2019, 10:09 PM

We never actually got to the point where this was needed. we can reevaluate in the future.

Pchelolo reopened this task as Open.Jun 12 2020, 3:36 PM

After some instability of change-prop in k8s, we decided that we need this. The only change in the change-prop codebase would be to default the partition to null instead of 0. Automatic partitioner will do the rest.

Aklapper removed Pchelolo as the assignee of this task.Jun 19 2020, 4:14 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

jijiki moved this task from Incoming 🐫 to Unsorted on the serviceops board.Aug 17 2020, 11:45 PM
Pchelolo claimed this task.Aug 18 2020, 9:17 PM

I've merged https://github.com/wikimedia/change-propagation/pull/351

Now we need to deploy change-prop with this change.

Change 623808 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/deployment-charts@master] Deploy change-propagation v0.10.3

https://gerrit.wikimedia.org/r/623808

Change 623808 merged by jenkins-bot:
[operations/deployment-charts@master] Deploy change-propagation v0.10.3

https://gerrit.wikimedia.org/r/623808

Pchelolo closed this task as Resolved.Sep 2 2020, 4:54 PM

The support for partitioning was added to change-prop and used for purges. We don't really need to partition the transclusions.