Page MenuHomePhabricator

Partition the transclusions topic in ChangeProp
Closed, ResolvedPublic

Description

All the transclusion-related events are sent to the change-prop.transcludes.resource-change topic, and right now it has a around 800 events per second. One of the ChangeProp workers is always around 90% CPU usage, which means it's almost at it's limit since a worker can only use one CPU core.

Most likely this worker is the one doing Varnish purges on the transcodes topic - construction HTCP packets is pretty CPU-intence and it's not bound on any IO, but we need to verify that. A brutal way to verify would be to kill the worker and look at the graphs, but I'm not sure it's a good idea. A less invasive way would be to add some sampled logging with a worker pid.

We need to consider partitioning the transcludes topic and adding support for partitioned topics in ChangeProp. Support for partitioning will come handy implementing T157088 too. There should be a parameter for a rule whether to use 1 worker for all partitions, or to use a worker-per-partition since we only want one rule to respect partitioning.

Event Timeline

We never actually got to the point where this was needed. we can reevaluate in the future.

After some instability of change-prop in k8s, we decided that we need this. The only change in the change-prop codebase would be to default the partition to null instead of 0. Automatic partitioner will do the rest.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Change 623808 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/deployment-charts@master] Deploy change-propagation v0.10.3

https://gerrit.wikimedia.org/r/623808

Change 623808 merged by jenkins-bot:
[operations/deployment-charts@master] Deploy change-propagation v0.10.3

https://gerrit.wikimedia.org/r/623808

The support for partitioning was added to change-prop and used for purges. We don't really need to partition the transclusions.