Interleave processing of backlink jobs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• GWicke
	Dec 2 2016, 6:58 PM

Description

ChangeProp currently expands backlinks eagerly into its constituent jobs. This is nicely modular, and conceptually simple. However, since backlink expansion is fairly quick, it also causes large jobs to be processed in large batches. Consequences are:

Large backlink jobs are fully processed before starting the processing of smaller, later jobs. This causes large delays on those smaller jobs.
Especially expensive backlink jobs (for example, involving an expensive template) are processed bunched in time. While concurrency limits in ChangeProp are designed to keep the load constant anyway, defective timeout logic / resource limiting (ex: T97192) can lead to higher actual concurrency and resource consumption by piling up zombie requests and triggering retries in services like Parsoid.

Proposed change

Since backlinks are expanded one batch request at a time, we can increase interleaving of jobs by performing the expansion lazily, as part of the "leaf" event processing.

Current flow:

Edit event topic 
  -> Eager expansion into leaf job topic, using continuation topic
  -> Leaf job execution

Proposed flow:

Edit event 
  -> Expand one batch to leaf job topic & add a special continuation event in the same topic 
  -> process leaf events 
    -> when encountering continuation event, continue batch expansion & add continuation if backlinks remaining.

The biggest change is interleaving continuations / expansions with the leaf event processing.

Details

	Subject	Repo	Branch	Lines +/-
	Change-Prop: Added internal job schema	mediawiki/event-schemas	master	+92 -1

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		• Pchelolo	T152229 Interleave processing of backlink jobs
		Declined		Ottomata	T152557 Allow multiple schemas in a single EventBus topic.

Event Timeline

• GWicke created this task.Dec 2 2016, 6:58 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 2 2016, 6:58 PM

• GWicke triaged this task as Medium priority.Dec 2 2016, 6:58 PM

• GWicke moved this task from Backlog to next on the Services board.

• GWicke edited projects, added Services (next); removed Services.

• GWicke mentioned this in T151702: API cluster failure / OOM.Dec 2 2016, 7:39 PM

• mobrovac added projects: Event-Platform, User-mobrovac.Dec 4 2016, 1:14 PM

Here's an option on how we could do this and still stay nice and modular:

The backlinks module has an internal URI right now that's called for a template. Currently we POST the original revision-create event to /sys/transcludes/{title}. We can change that to have a /sys/transcludes/{title}{/continue_token} uri.
Instead of posting a continuation event into the special continue topic we can post a resource-change event with this internal URI.
The primary rule that rerenders RESTBase will have 2 cases: one will react to normal resource-change events and the second one will react to resuce-changes with this special internal URI.

This wouldn't require a lot of code changes and the expansion itself stays nice and modular. The downside is that we expose some of ChangeProp internal structure to the resource-change topic, but that topic is pretty internal on it's own, so I don't think it's a major issue. Thoughts?

@Pchelolo, that sounds very good to me.

To avoid special-casing the resource-change topic structure, perhaps we could add an optional "continuations" property to the meta structure? This could optionally contain an array of URLs to call before marking the event processing "done".

By integrating it in the "meta" structure, we could apply this continuation strategy to any of the topics, and write some generalized filter / handler that interprets it.

Hm, our current continuation event has several fields not present in the resource-change event:

sequence_num - this is used for deduplication.
continue - the continuation token for the event
original_event - used to hold the original event that triggered the sequence of continuations.

Adding all 3 of these to resource-change event schema seems VERY hacky and wrong semantically. Maybe we should allow 1 topic to have a list of acceptable schemas so that we could push both resource-change event to the topic and a continue event and then switch in the rule? cc @Ottomata

We could still consider adding one optional continuation property in meta, which in turn would have those sub-properties.

For making a decision between different schemas vs. optional properties, here are some areas to look at:

Complexity of consuming a topic for a) changeprop, and b) other consumers.
Documentation of topic schemas.
Complexity of schema event validation.

Change 325620 had a related patch set uploaded (by Ppchelko):
Change-Prop: Added internal job schema

https://gerrit.wikimedia.org/r/325620

gerritbot added a project: Patch-For-Review.Dec 6 2016, 8:21 PM

One point that gave been made is that we have to have separate events for actual rerender and the continuation event. If we have one event that does both processing and continuation, in case something fails and being retried the continuation chain will either break or forked.

• ssastry mentioned this in T152074: Separate clusters for asynchronous processing from the ones for public consumption.Dec 7 2016, 4:39 PM

Joe subscribed.Dec 7 2016, 4:46 PM

I have one question: is this going to guarantee a limited rate of processing of a single transclusion or it's just going to be an after-effect of having a lot of incoming jobs?

I'd prefer for the rate of jobs submitted to restbase from a single transclusion to be effectively rate-limited, instead of relying on enough other messages being in the queue for spreading processing.

In T152229#2854392, @Joe wrote:

I have one question: is this going to guarantee a limited rate of processing of a single transclusion or it's just going to be an after-effect of having a lot of incoming jobs?

I'd prefer for the rate of jobs submitted to restbase from a single transclusion to be effectively rate-limited, instead of relying on enough other messages being in the queue for spreading processing.

The rate of transclusion-related jobs is limited right now, but the limit is global for all transclusion-related jobs. The limit is 400 concurrent jobs and this number was chosen after a long series of gradual increases because for any smaller number we couldn't keep up with the rate of incoming events.

What will change here is that right now all the reparses derived from a single template edit are added to the queue very quickly, so basically there's almost no chance for different template edits to mix together. After this change if 2 templates we edited close in time the processing for pages transcluding these two templates will interleave. Also it should be more gradual since you don't add a million jobs instantly, but instead you add a chunks of 500 events, then you wait for them to get processed, then you add 500 more etc.

In T152229#2854682, @Pchelolo wrote:

In T152229#2854392, @Joe wrote:

I have one question: is this going to guarantee a limited rate of processing of a single transclusion or it's just going to be an after-effect of having a lot of incoming jobs?

I'd prefer for the rate of jobs submitted to restbase from a single transclusion to be effectively rate-limited, instead of relying on enough other messages being in the queue for spreading processing.

The rate of transclusion-related jobs is limited right now, but the limit is global for all transclusion-related jobs. The limit is 400 concurrent jobs and this number was chosen after a long series of gradual increases because for any smaller number we couldn't keep up with the rate of incoming events.

Looking at https://grafana.wikimedia.org/dashboard/db/eventbus?panelId=8&fullscreen&from=1472700500461&to=1481133680425 and the on-transclusion-update_exec topic, it looks like the average is 193 reqs/sec and page_edit topic shows an average of 13. So, it looks like you should be able to reduce this concurrency rate to 250 and still keep up? This also matches the avg oldid parse rate of 201 in Parsoid over the last 3 months. Or, am I missing some other details?

@ssastry Basically it depends on how timely we want these updates to happen. If we reduce it to 250 the actual rerender could get delayed for several days for a large template. Right now with 400 the maximum delay we've seen is about 24 hours.

Change 325620 abandoned by Ppchelko:
Change-Prop: Added internal job schema

Reason:
Per discussion in T152557

https://gerrit.wikimedia.org/r/325620

Mentioned in SAL (#wikimedia-operations) [2016-12-08T21:19:31Z] <mobrovac@tin> Starting deploy [changeprop/deploy@2fe48e0]: Deploying fix for T152229

Mentioned in SAL (#wikimedia-operations) [2016-12-08T21:20:22Z] <mobrovac@tin> Finished deploy [changeprop/deploy@2fe48e0]: Deploying fix for T152229 (duration: 00m 51s)

Deployed, resolving.

• mobrovac closed subtask T152557: Allow multiple schemas in a single EventBus topic. as Declined.Dec 8 2016, 9:33 PM

Interleave processing of backlink jobsClosed, ResolvedPublicActions