Page MenuHomePhabricator

RFC: Requirements for change propagation
Closed, ResolvedPublic

Description

At a high level, change propagation involves three main ingredients:

  1. Distributing change events, and
  2. processing and propagating those change events using
  3. a dependency graph.

The bulk of this RFC focusing on the general requirements & considerations can be found here:
https://www.mediawiki.org/wiki/Requests_for_comment/Requirements_for_change_propagation

Current status

  • T84923: Reliable publish / subscribe event bus is deployed, and provides event streams for edits & resource changes. Under the hood, all topics are prefixed by source datacenter & replicated (see T127718), which lets us cleanly move event processing between datacenters.
  • T117933: Change propagation service, phase 1 is gradually being rolled out at the moment. Driven by a declarative config file, this service subscribes to EventBus topics, and processes events by making HTTP requests to other services, or by sending purges to Varnishes. Events are consumed from specific topics, and can be further filtered by arbitrary properties, including URL patterns.
    • An example module for iterative backlink processing was already created. This module nicely separates the expansion of dependencies from their processing, and can serve as a model for further iterative dependency expansion.
  • T126687: RFC: Publish all resource changes to a single topic introduced a single topic recording URL-based resource changes. This topic is intended to be used for CDN purges, and is already used to trigger secondary updates in the ChangeProp service. @Smalyshev and @aaron are looking into sending all MediaWiki CDN purge requests to this topic.

Next steps and open questions

  • ChangeProp service expansion: ChangeProp will gradually expand to cover more use cases. Initially the services team will focus on RESTBase's use cases (including red link, template & media re-renders), and will also move CDN purging from RESTBase to ChangeProp.
  • Reliable CDN purging: There have been various discussions about making CDN purging more reliable. Current ideas include running Kafka clients on each Varnish node, which would effectively replace the best-effort multicast setup. However, there are also plans to reduce the purge volume by using alternate keys (like Varnish's XKey / T122881). A single purge would match all resources associated with an underlying resource like a page. However, with asynchronous updates it will be tricky to determine the best time to issue such a purge. We will still need to issue several purges after a primary event. Coordinating these is currently an open problem.
  • Reliable RCStream: @Ottomata has been looking into leveraging Kafka events in RCStream. This can potentially let clients catch up after being disconnected. See T130651: EventStreams.
  • Cross-project dependency tracking & change propagation: We currently don't have any general way to track dependencies across projects. Special-case mechanisms were developed for commons and to some degree Wikidata, but other applications (like T91162: RFC: Shadow namespaces) will need dependency tracking abilities as well. It would be good to generalize this infrastructure, so that efforts can be shared across several use cases. Open questions in this space:
    • API requirements & possible designs for querying & updating dependencies.
    • Dependency graph storage: T105766: RFC: Dependency graph storage; sketch: adjacency list in DB discusses some options for storing such dependencies in a general manner, but it's early days & we should probably make our requirements more precise before diving too deeply into the concrete design.

See also

Related Objects

StatusSubtypeAssignedTask
Resolvedaaron
Resolvedaaron
ResolvedOttomata
DuplicateNone
DuplicateNone
DuplicateBBlack
Resolved ema
Resolveddaniel
Resolved GWicke
ResolvedOttomata
InvalidOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedRobH
ResolvedOttomata
Resolved Cmjohnson
Resolvedelukey
ResolvedRobH
Resolved mobrovac
ResolvedEevans
Declined csteipp
Resolved csteipp
Resolved GWicke
Resolved ssastry
Resolved Pchelolo
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
Resolved madhuvishy
ResolvedOttomata
Resolved madhuvishy
ResolvedOttomata
Resolved mobrovac
Resolved mobrovac

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

November 6, and this proposal doesn't seem to have much traction, it is not on track. Unless there is a sudden change, I will leave the ultimate decision of pre-scheduling it for the Wikimedia-Developer-Summit-2016 to @RobLa-WMF and the Architecture Committee.

Currently it is written as if the individual steps will only happen one at a time. Should something be added to indicate that for performance every step of it needs to support batching?

@JanZerebecki: The current intention is to keep change propagation relatively simple and efficient. Most services can be implemented with very small relative per-request overheads, and services with high per-request overheads can consider applying opportunistic batching transparently to all requests, for example using a batching proxy.

Wikimedia Developer Summit 2016 ended two weeks ago. This task is still open. If the session in this task took place, please make sure 1) that the session Etherpad notes are linked from this task, 2) that followup tasks for any actions identified have been created and linked from this task, 3) to change the status of this task to "resolved". If this session did not take place, change the task status to "declined". If this task itself has become a well-defined action which is not finished yet, drag and drop this task into the "Work continues after Summit" column on the project workboard. Thank you for your help!

RobLa-WMF mentioned this in Unknown Object (Event).May 11 2016, 12:09 AM
RobLa-WMF triaged this task as High priority.

Per E170 (I'm shepherding this one)

TechCom plans to discuss this next week in E184: RFC Meeting: RFC: Requirements for change propagation (2016-05-18, #wikimedia-office)

In order to have a productive conversation, we need to figure out which things are done and decided, and which items need broader consensus. Which questions should we focus on next week? Which questions need to be answered before this is approved/or rejected? What does "approved" mean in this context?

@RobLa-WMF @daniel, I updated the task summary with a status summary & a sketch of next steps & open questions. Please have a look & tweak as necessary.

@GWicke - I made a conversion of the description to wikitext at mw:User:RobLa-WMF/T102476. The next step will be to convert it into mw.org RFC format. RFCs with long prose seem better to have over on mw.org, with a short description + link here on Phab; MediaWiki's editing and diffing tools are better (yay you folks!)

On E184, @GWicke said "I'm not opposed to moving the general portion especially to mw.org, lets just make sure we don't end up with multiple copies" My edits to the description reflect my understanding of that.

I moved the current status / next steps section back here, so that only the general background section is now on MediaWiki.org. Status & next steps is more volatile & links directly to ongoing work, so benefits from being in this task.

This was discussed on #wikimedia-office on 2016-05-18.
Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.html
Full log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.log.html

Summary:

  • ''LINK:'' https://phabricator.wikimedia.org/T102476 (gwicke, 21:01:03)
  • robla asks for response on T102476#2296335. gwicke answers "I'm not opposed to moving the general portion especially to mw.org, lets just make sure we don't end up with multiple copies" (robla, 21:12:37)
  • ''LINK:'' https://commons.wikimedia.org/wiki/Template:LangSwitch (matt_flaschen, 21:24:59)
  • significant use: track dependencies when rendering pages with {{int}} and <translate> so that they can be purged when conditional dependencies change (TimStarling, 21:33:47)
  • what needs purging is *gernerated* content, so we need to track what it depends on (DanielK_WMDE, 21:37:47)
  • ''LINK:'' https://phabricator.wikimedia.org/T130528 (DanielK_WMDE, 21:50:39)
  • ''ACTION:'' update the RFC to clarify the anticipated concrete dependency relations (TimStarling, 22:01:38)

During yesterday's meeting, it became apparent that it would be useful for me to summarize the requirements that Wikibase has for dependency tracking. I will try to do that below. The current system that Wikibase uses for this purpose is documented in docs/usagetracking.wiki.

Our assumptions, questions, and requirements:

  • in order to propagate changes, we need to track which resources or other artifacts a generated artifact depends on.
  • The tracking is effectively a DAG, with artifact URIs as the nodes. We may want to associated additional information though, like the tough date.
  • generated artifacts may depend on other generated artifacts, or on human edited resources.
  • a user edited resource, such as wikitext on a wiki page, never depends on anything (though it may reference other things).
    • MediaWiki uses link tables to track what wikitext references what; currently, we also use this information to infer when to purge the parser cache of which page. This system fails if different renderings have different dependencies.
  • a rendering of a resource depends at least on that resource, and possibly on other resources (typically the ones referenced in the resource) or other artifacts.
  • artifacts may be created on the fly during a GET request; their dependency then needs to be stored somewhere/somehow, causing the need to write during a GET.
    • example: a multilingual page (e.g. an image description page on commons) is requested in a language for which there exists no rendering yet. Technically, this is due to things like {{langswitch}} and <translate>.
    • another thing the rendering could vary on is desktop vs. mobile.
    • ...not to speak of the skin, date format, stub threshold....
    • note: it is not computationally feasible to create all possible renderings in advance, to avoid generating them on the fly.
  • When purging caches (e.g. CDN cache or ParserCache), we need artifacts to be associated with multiple buckets. E.g. we need to be able to purge all renderings of X, or all renderings that depends on Y.
  • Artifacts may expire and get purged. Any tracked dependencies are then redundant, and could be pruned (garbage collection).
  • Conversely, artifacts that are no longer needed by any other may be pruned / garbage collected
  • How reliable do we need the dependency tracking to be? What happens if we lose this info?
  • How do we scale this beyond the wikimedia cluster?
    • use case: InstantCommons and, in the future, InstantWikidata
    • this essentially needs a PubSub mechanism, with fairly high granularity. How to make it scale for a large subscriber base, and high update frequency?
  • Rough sketch of a dependency storage interface: put(x,y), drop(x,y), get(x), rget(y), purge(x), rpurge(y); batch operations: replace(x,y1,y2,y3,....), add(x,y1,y2,y3,...), remove(x,y1,y2,y3,...) plus possible the inverse

I'm going to make a note here mainly for myself as shepherd. In E202, I noted I was going to take this off of our weekly action items. There's probably some work that needs to happen to incorporate Daniel's thinking expressed in T102476#2309963 into the document at https://www.mediawiki.org/wiki/Requests_for_comment/Requirements_for_change_propagation . @daniel and @GWicke seemed to believe that this RfC is on track, and would be a good example of an RFC where we might need the "on track" state we discussed (see T137860)

RobLa-WMF moved this task from Under discussion to In progress on the TechCom-RFC board.

I'm leaving myself marked as the shepherd on the TechCom-Has-shepherd board. The role of shepherd is mainly advisory, and not a good use for "assignee". Also, moving this to the "in progress" (called "on track" in T102476#2381742)

GWicke lowered the priority of this task from High to Low.Oct 12 2016, 10:28 PM
GWicke edited projects, added Services (watching); removed Services.

This seems obsolete. Is there any interest in keeping this open and continue the RFC process here?

Change propagation is an important topic, but it seems we need a fresh start.

In other languages? Wikimedia's Director Katherine Maher in Wikimania 2017 mentioned potentially 7k languages in Wikipedia by 2030

daniel claimed this task.
daniel edited projects, added TechCom-RFC (TechCom-RFC-Closed); removed TechCom-RFC.

Closing this as Resolved: the RFC provided the guidance needed to implement the ChangeProp service.