RFC: Requirements for change propagation
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• GWicke
	Jun 15 2015, 2:04 PM

Description

At a high level, change propagation involves three main ingredients:

Distributing change events, and
processing and propagating those change events using
a dependency graph.

The bulk of this RFC focusing on the general requirements & considerations can be found here:
https://www.mediawiki.org/wiki/Requests_for_comment/Requirements_for_change_propagation

Current status

T84923: Reliable publish / subscribe event bus is deployed, and provides event streams for edits & resource changes. Under the hood, all topics are prefixed by source datacenter & replicated (see T127718), which lets us cleanly move event processing between datacenters.

T117933: Change propagation service, phase 1 is gradually being rolled out at the moment. Driven by a declarative config file, this service subscribes to EventBus topics, and processes events by making HTTP requests to other services, or by sending purges to Varnishes. Events are consumed from specific topics, and can be further filtered by arbitrary properties, including URL patterns.
- An example module for iterative backlink processing was already created. This module nicely separates the expansion of dependencies from their processing, and can serve as a model for further iterative dependency expansion.
T126687: RFC: Publish all resource changes to a single topic introduced a single topic recording URL-based resource changes. This topic is intended to be used for CDN purges, and is already used to trigger secondary updates in the ChangeProp service. @Smalyshev and @aaron are looking into sending all MediaWiki CDN purge requests to this topic.

Next steps and open questions

ChangeProp service expansion: ChangeProp will gradually expand to cover more use cases. Initially the services team will focus on RESTBase's use cases (including red link, template & media re-renders), and will also move CDN purging from RESTBase to ChangeProp.

Reliable CDN purging: There have been various discussions about making CDN purging more reliable. Current ideas include running Kafka clients on each Varnish node, which would effectively replace the best-effort multicast setup. However, there are also plans to reduce the purge volume by using alternate keys (like Varnish's XKey / T122881). A single purge would match all resources associated with an underlying resource like a page. However, with asynchronous updates it will be tricky to determine the best time to issue such a purge. We will still need to issue several purges after a primary event. Coordinating these is currently an open problem.

Reliable RCStream: @Ottomata has been looking into leveraging Kafka events in RCStream. This can potentially let clients catch up after being disconnected. See T130651: EventStreams.

Cross-project dependency tracking & change propagation: We currently don't have any general way to track dependencies across projects. Special-case mechanisms were developed for commons and to some degree Wikidata, but other applications (like T91162: RFC: Shadow namespaces) will need dependency tracking abilities as well. It would be good to generalize this infrastructure, so that efforts can be shared across several use cases. Open questions in this space:
- API requirements & possible designs for querying & updating dependencies.
- Dependency graph storage: T105766: RFC: Dependency graph storage; sketch: adjacency list in DB discusses some options for storing such dependencies in a general manner, but it's early days & we should probably make our requirements more precise before diving too deeply into the concrete design.

Related Objects
Search...

Status	Assigned	Task
Resolved	aaron	T88445 MediaWiki active/active datacenter investigation and work (tracking)
Resolved	aaron	T97562 WANObjectCache relay daemon or mcrouter support
Resolved	Ottomata	T123954 Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters.
		Restricted Task
Duplicate	None	T109331 Deleted files sometimes remain visible to non-privileged users if permanently linked
Duplicate	None	T133819 upload-lb.ulsfo.wikimedia.org still allow access to some deleted files
Duplicate	BBlack	T119038 Image cache issue when 'over-writing' an image on commons
Resolved	• ema	T133821 Make CDN purges reliable
Resolved	daniel	T102476 RFC: Requirements for change propagation
Resolved	• GWicke	T84923 Reliable publish / subscribe event bus
Resolved	Ottomata	T88459 Implementing the reliable event bus using Kafka
Invalid	Ottomata	T110748 Event Bus
Resolved	Ottomata	T110750 Investigate improving Confluent REST Proxy and Schema Registry for Event Bus
Resolved	Ottomata	T114443 EventBus MVP
Resolved	RobH	T114191 Setup a 2 server Kafka instance in both eqiad and codfw for reliable purge streams
		Unknown Object (Task)
		Unknown Object (Task)
		Unknown Object (Task)
		Unknown Object (Task)
Resolved	Ottomata	T121553 setup kafka1001 & kafka1002
Resolved	• Cmjohnson	T121578 Rack 8 new misc servers
Resolved	elukey	T121558 setup kafka2001 & kafka2002
Resolved	RobH	T120885 codfw: rack 8 new misc systems
Resolved	• mobrovac	T116247 Define edit related events for change propagation
Resolved	Eevans	T116786 Integrate eventbus-based event production into MediaWiki
Declined	• csteipp	T120133 security review of ramsey/uuid
Resolved	• csteipp	T120212 Security review of EventBus extension
Resolved	• GWicke	T120409 RESTBase should honor wiki-wide deletion/suppression of users
Resolved	• ssastry	T125266 Remove user name and edit comment from html <head>
Resolved	• Pchelolo	T122079 Update EventBus extension to produce User-block events
Resolved	Ottomata	T122077 Define schema for a User-block event
Resolved	Ottomata	T118578 Package EventLogging and dependencies for Jessie
Resolved	Ottomata	T118761 Move EventLogging/server to its own repo and set up CI
Resolved	Ottomata	T118780 Puppetize eventlogging-service
Resolved	• madhuvishy	T118903 Make eventlogging logs configurable via python config file [5 pts] {oryx}
Resolved	Ottomata	T118863 Deploy eventlogging from new repository [5 pts]
Resolved	• madhuvishy	T118869 Send HTTP stats about eventlogging-service to statsd [3 pts]
Resolved	Ottomata	T121112 Build tornado-sprocket python packages
Resolved	• mobrovac	T128463 New Service Request - Change Propagation
Resolved	• mobrovac	T130948 Scap3 promote stage not working

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Qgil moved this task from Missing expected fields to Missing active discussion on the Wikimedia-Developer-Summit-2016 board.Oct 22 2015, 8:40 AM

intracer subscribed.Oct 27 2015, 10:37 PM

• GWicke mentioned this in T117933: Change propagation service, phase 1.Nov 6 2015, 12:12 AM

• GWicke updated the task description. (Show Details)

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 6 2015, 12:14 AM

November 6, and this proposal doesn't seem to have much traction, it is not on track. Unless there is a sudden change, I will leave the ultimate decision of pre-scheduling it for the Wikimedia-Developer-Summit-2016 to @RobLa-WMF and the Architecture Committee.

• mobrovac mentioned this in T118162: Wikibase dispatchChanges.php runs slow, creates an absurd amount of database connections.Nov 18 2015, 5:05 PM

• RobLa-WMF mentioned this in T119029: WikiDev 16 working area: Content access and APIs.Nov 19 2015, 12:47 AM

JanZerebecki added a project: Wikidata.Dec 8 2015, 1:15 PM

Addshore subscribed.Dec 8 2015, 1:53 PM

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Dec 11 2015, 10:11 AM

JanZerebecki updated the task description. (Show Details)Dec 23 2015, 8:08 AM

Currently it is written as if the individual steps will only happen one at a time. Should something be added to indicate that for performance every step of it needs to support batching?

• GWicke updated the task description. (Show Details)Dec 28 2015, 11:51 PM

@JanZerebecki: The current intention is to keep change propagation relatively simple and efficient. Most services can be implemented with very small relative per-request overheads, and services with high per-request overheads can consider applying opportunistic batching transparently to all requests, for example using a batching proxy.

• mobrovac closed subtask T116247: Define edit related events for change propagation as Resolved.Jan 12 2016, 7:21 PM

Wikimedia Developer Summit 2016 ended two weeks ago. This task is still open. If the session in this task took place, please make sure 1) that the session Etherpad notes are linked from this task, 2) that followup tasks for any actions identified have been created and linked from this task, 3) to change the status of this task to "resolved". If this session did not take place, change the task status to "declined". If this task itself has become a well-defined action which is not finished yet, drag and drop this task into the "Work continues after Summit" column on the project workboard. Thank you for your help!

• GWicke closed subtask T84923: Reliable publish / subscribe event bus as Resolved.Jan 22 2016, 5:47 PM

hoo subscribed.Jan 26 2016, 7:41 PM

ArielGlenn subscribed.Feb 8 2016, 1:28 PM

• RobLa-WMF mentioned this in T125865: Assign RFCs to ArchCom shepherds.Feb 10 2016, 8:15 PM

• GWicke removed a project: Wikimedia-Developer-Summit-2016.Feb 11 2016, 11:34 PM

• GWicke updated the task description. (Show Details)Feb 11 2016, 11:40 PM

Qgil unsubscribed.Feb 12 2016, 8:33 AM

• GWicke removed • GWicke as the assignee of this task.Mar 23 2016, 9:17 PM

BBlack mentioned this in T133821: Make CDN purges reliable.Apr 28 2016, 12:08 AM

BBlack added a parent task: T133821: Make CDN purges reliable.

Danny_B added a project: Proposal.May 2 2016, 10:14 PM

• RobLa-WMF mentioned this in Unknown Object (Event).May 11 2016, 12:09 AM

Per E170 (I'm shepherding this one)

• RobLa-WMF mentioned this in E184: RFC Meeting: RFC: Requirements for change propagation (2016-05-18, #wikimedia-office).May 11 2016, 9:42 PM

• RobLa-WMF mentioned this in E171: RFC Meeting: Overhaul Interwiki map, unify with Sites and WikiMap (2016-05-11, #wikimedia-office).May 11 2016, 10:10 PM

TechCom plans to discuss this next week in E184: RFC Meeting: RFC: Requirements for change propagation (2016-05-18, #wikimedia-office)

In order to have a productive conversation, we need to figure out which things are done and decided, and which items need broader consensus. Which questions should we focus on next week? Which questions need to be answered before this is approved/or rejected? What does "approved" mean in this context?

• RobLa-WMF mentioned this in T113034: RFC: Overhaul Interwiki map, unify with Sites and WikiMap.May 11 2016, 11:50 PM

• GWicke updated the task description. (Show Details)May 13 2016, 10:02 PM

• GWicke added subscribers: Smalyshev, Ottomata.

@RobLa-WMF @daniel, I updated the task summary with a status summary & a sketch of next steps & open questions. Please have a look & tweak as necessary.

• GWicke updated the task description. (Show Details)May 13 2016, 11:12 PM

@GWicke - I made a conversion of the description to wikitext at mw:User:RobLa-WMF/T102476. The next step will be to convert it into mw.org RFC format. RFCs with long prose seem better to have over on mw.org, with a short description + link here on Phab; MediaWiki's editing and diffing tools are better (yay you folks!)

• RobLa-WMF moved this task from Under discussion to Request IRC meeting on the TechCom-RFC board.May 18 2016, 5:19 PM

• mobrovac added a project: Event-Platform.May 18 2016, 5:29 PM

• RobLa-WMF mentioned this in T128351: Notifications should be in core.May 18 2016, 6:20 PM

Scott_WUaS updated the task description. (Show Details)May 18 2016, 9:13 PM

Scott_WUaS subscribed.

• RobLa-WMF updated the task description. (Show Details)May 18 2016, 10:21 PM

On E184, @GWicke said "I'm not opposed to moving the general portion especially to mw.org, lets just make sure we don't end up with multiple copies" My edits to the description reflect my understanding of that.

• GWicke updated the task description. (Show Details)May 18 2016, 10:41 PM

I moved the current status / next steps section back here, so that only the general background section is now on MediaWiki.org. Status & next steps is more volatile & links directly to ongoing work, so benefits from being in this task.

• RobLa-WMF mentioned this in P3128 2016-05-18 ArchCom-RFC meeting (#wikimedia-office).May 19 2016, 12:40 AM

This was discussed on #wikimedia-office on 2016-05-18.
Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.html
Full log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.log.html

Summary:

''LINK:'' https://phabricator.wikimedia.org/T102476 (gwicke, 21:01:03)
robla asks for response on T102476#2296335. gwicke answers "I'm not opposed to moving the general portion especially to mw.org, lets just make sure we don't end up with multiple copies" (robla, 21:12:37)
''LINK:'' https://commons.wikimedia.org/wiki/Template:LangSwitch (matt_flaschen, 21:24:59)
significant use: track dependencies when rendering pages with {{int}} and <translate> so that they can be purged when conditional dependencies change (TimStarling, 21:33:47)
what needs purging is *gernerated* content, so we need to track what it depends on (DanielK_WMDE, 21:37:47)
''LINK:'' https://phabricator.wikimedia.org/T130528 (DanielK_WMDE, 21:50:39)
''ACTION:'' update the RFC to clarify the anticipated concrete dependency relations (TimStarling, 22:01:38)

During yesterday's meeting, it became apparent that it would be useful for me to summarize the requirements that Wikibase has for dependency tracking. I will try to do that below. The current system that Wikibase uses for this purpose is documented in docs/usagetracking.wiki.

Our assumptions, questions, and requirements:

in order to propagate changes, we need to track which resources or other artifacts a generated artifact depends on.
The tracking is effectively a DAG, with artifact URIs as the nodes. We may want to associated additional information though, like the tough date.
generated artifacts may depend on other generated artifacts, or on human edited resources.
a user edited resource, such as wikitext on a wiki page, never depends on anything (though it may reference other things).
- MediaWiki uses link tables to track what wikitext references what; currently, we also use this information to infer when to purge the parser cache of which page. This system fails if different renderings have different dependencies.
a rendering of a resource depends at least on that resource, and possibly on other resources (typically the ones referenced in the resource) or other artifacts.
artifacts may be created on the fly during a GET request; their dependency then needs to be stored somewhere/somehow, causing the need to write during a GET.
- example: a multilingual page (e.g. an image description page on commons) is requested in a language for which there exists no rendering yet. Technically, this is due to things like {{langswitch}} and <translate>.
- another thing the rendering could vary on is desktop vs. mobile.
- ...not to speak of the skin, date format, stub threshold....
- note: it is not computationally feasible to create all possible renderings in advance, to avoid generating them on the fly.
When purging caches (e.g. CDN cache or ParserCache), we need artifacts to be associated with multiple buckets. E.g. we need to be able to purge all renderings of X, or all renderings that depends on Y.
Artifacts may expire and get purged. Any tracked dependencies are then redundant, and could be pruned (garbage collection).
Conversely, artifacts that are no longer needed by any other may be pruned / garbage collected
How reliable do we need the dependency tracking to be? What happens if we lose this info?
How do we scale this beyond the wikimedia cluster?
- use case: InstantCommons and, in the future, InstantWikidata
- this essentially needs a PubSub mechanism, with fairly high granularity. How to make it scale for a large subscriber base, and high update frequency?
Rough sketch of a dependency storage interface: put(x,y), drop(x,y), get(x), rget(y), purge(x), rpurge(y); batch operations: replace(x,y1,y2,y3,....), add(x,y1,y2,y3,...), remove(x,y1,y2,y3,...) plus possible the inverse

Liuxinyu970226 subscribed.May 21 2016, 2:36 AM

• Mholloway subscribed.May 31 2016, 3:04 PM

• RobLa-WMF moved this task from Request IRC meeting to Under discussion on the TechCom-RFC board.Jun 1 2016, 7:45 PM

I'm going to make a note here mainly for myself as shepherd. In E202, I noted I was going to take this off of our weekly action items. There's probably some work that needs to happen to incorporate Daniel's thinking expressed in T102476#2309963 into the document at https://www.mediawiki.org/wiki/Requests_for_comment/Requirements_for_change_propagation . @daniel and @GWicke seemed to believe that this RfC is on track, and would be a good example of an RFC where we might need the "on track" state we discussed (see T137860)

• RobLa-WMF added a project: TechCom-Has-shepherd.Jul 13 2016, 5:09 AM

• RobLa-WMF moved this task from Backlog to RobLa-WMF on the TechCom-Has-shepherd board.Jul 13 2016, 5:14 AM

I'm leaving myself marked as the shepherd on the TechCom-Has-shepherd board. The role of shepherd is mainly advisory, and not a good use for "assignee". Also, moving this to the "in progress" (called "on track" in T102476#2381742)

• GWicke updated the task description. (Show Details)Oct 5 2016, 8:49 PM

• GWicke lowered the priority of this task from High to Low.Oct 12 2016, 10:28 PM

• GWicke edited projects, added Services (watching); removed Services.

daniel removed a project: TechCom-RFC.Nov 16 2016, 6:39 PM

daniel added a project: TechCom-RFC.

daniel moved this task from RobLa-WMF to Daniel on the TechCom-Has-shepherd board.Nov 16 2016, 6:45 PM

daniel mentioned this in T114662: RFC: Per-language URLs for multilingual wiki pages.Nov 28 2016, 7:40 PM

daniel mentioned this in T154738: Add wikitext grammer for embedding properties from other pages.Jan 25 2017, 8:06 PM

• MZMcBride subscribed.Jan 25 2017, 11:00 PM

Restricted Application added a project: Analytics. · View Herald TranscriptJan 25 2017, 11:00 PM

• Nuria moved this task from Incoming to Radar on the Analytics board.Jan 30 2017, 5:37 PM

Krinkle removed projects: TechCom-Has-shepherd, Proposal.Dec 21 2017, 11:43 PM

Krinkle moved this task from In progress to Under discussion on the TechCom-RFC board.Dec 22 2017, 12:47 AM

This seems obsolete. Is there any interest in keeping this open and continue the RFC process here?

Change propagation is an important topic, but it seems we need a fresh start.

Scott_WorldUnivAndSch subscribed.Jan 9 2018, 7:49 PM

In other languages? Wikimedia's Director Katherine Maher in Wikimania 2017 mentioned potentially 7k languages in Wikipedia by 2030

Closing this as Resolved: the RFC provided the guidance needed to implement the ChangeProp service.

Krinkle moved this task from Untriaged to Implemented on the TechCom-RFC (TechCom-RFC-Closed) board.Mar 8 2018, 3:30 AM

Liuxinyu970226 unsubscribed.Mar 8 2018, 2:51 PM

Aklapper edited projects, added Analytics-Radar; removed Analytics.Jun 10 2020, 6:44 AM

Aklapper removed subscribers: • RobLa-WMF, Anomie.Oct 16 2020, 5:43 PM

RFC: Requirements for change propagationClosed, ResolvedPublicActions