At a high level, change propagation involves three main ingredients:
- Distributing change events, and
- processing and propagating those change events using
- a dependency graph.
The bulk of this RFC focusing on the general requirements & considerations can be found here:
- T84923: Reliable publish / subscribe event bus is deployed, and provides event streams for edits & resource changes. Under the hood, all topics are prefixed by source datacenter & replicated (see T127718), which lets us cleanly move event processing between datacenters.
- T117933: Change propagation service, phase 1 is gradually being rolled out at the moment. Driven by a declarative config file, this service subscribes to EventBus topics, and processes events by making HTTP requests to other services, or by sending purges to Varnishes. Events are consumed from specific topics, and can be further filtered by arbitrary properties, including URL patterns.
- An example module for iterative backlink processing was already created. This module nicely separates the expansion of dependencies from their processing, and can serve as a model for further iterative dependency expansion.
- T126687: RFC: Publish all resource changes to a single topic introduced a single topic recording URL-based resource changes. This topic is intended to be used for CDN purges, and is already used to trigger secondary updates in the ChangeProp service. @Smalyshev and @aaron are looking into sending all MediaWiki CDN purge requests to this topic.
Next steps and open questions
- ChangeProp service expansion: ChangeProp will gradually expand to cover more use cases. Initially the services team will focus on RESTBase's use cases (including red link, template & media re-renders), and will also move CDN purging from RESTBase to ChangeProp.
- Reliable CDN purging: There have been various discussions about making CDN purging more reliable. Current ideas include running Kafka clients on each Varnish node, which would effectively replace the best-effort multicast setup. However, there are also plans to reduce the purge volume by using alternate keys (like Varnish's XKey / T122881). A single purge would match all resources associated with an underlying resource like a page. However, with asynchronous updates it will be tricky to determine the best time to issue such a purge. We will still need to issue several purges after a primary event. Coordinating these is currently an open problem.
- Reliable RCStream: @Ottomata has been looking into leveraging Kafka events in RCStream. This can potentially let clients catch up after being disconnected. See T130651: EventStreams.
- Cross-project dependency tracking & change propagation: We currently don't have any general way to track dependencies across projects. Special-case mechanisms were developed for commons and to some degree Wikidata, but other applications (like T91162: RFC: Shadow namespaces) will need dependency tracking abilities as well. It would be good to generalize this infrastructure, so that efforts can be shared across several use cases. Open questions in this space:
- API requirements & possible designs for querying & updating dependencies.
- Dependency graph storage: T105766: RFC: Dependency graph storage; sketch: adjacency list in DB discusses some options for storing such dependencies in a general manner, but it's early days & we should probably make our requirements more precise before diving too deeply into the concrete design.
- T84923: Reliable publish / subscribe event bus: Reliable event distribution with publish / subscribe queues
- T105766: RFC: Dependency graph storage; sketch: adjacency list in DB
- T117933: Change propagation service, phase 1
- T126687: RFC: Publish all resource changes to a single topic
- T105845: RFC: Page components / content widgets is partly about clearly documenting dependencies for each piece of rendered content
- T88459: Implementing the reliable event bus using Kafka
- Feeding Frenzy: Selectively Materializing Users’ Event Feeds (Silberstein et al, SIGMOD 10): Explores trade-off between push & pull based propagation
- Twitter architecture summary, 2013
- Google Percolator (Peng & Dabek, USENIX 10): Incremental processing / change propagation system with transactional updates, built on BigTable
- T48525: Build an interwiki notifications framework and implement it for InstantCommons