Page MenuHomePhabricator

RFC: Publish all resource changes to a single topic
Closed, ResolvedPublic


We have started to create a variety of topics with specific information about events in MediaWiki. The rich event-specific information provided is useful for analytics and many other use cases, but is not typically needed for basic change propagation.

The main piece of information for basic change propagation is *what* has actually changed. We currently tend to use the public URL of the modified resource for this purpose. This information alone is already sufficient for a variety of change propagation use cases:

  • Varnish purging.
  • Update Parsoid HTML after an edit.
  • Update mobile HTML once Parsoid HTML for a page has been updated.
  • Update HTML in other projects after edits to (a part) of a wikidata item.
  • Pool / depool servers after an etcd resource change, or more generally, publish etcd resource changes.


Renames (moves) are traditionally difficult to represent in an idempotent way. However, many change propagation use cases don't actually need to know about the fact that a move happen. Knowing that the old & new location was modified is sufficient to implement all update use cases mentioned above.

Some changes might not have existing URLs associated with them. For example, revision suppressions modify revision metadata, but MediaWiki does not define a well-known URL for the revision metadata / restrictions themselves. One option we could consider is to add a little bit of extra information about the kind of change. If the main event was a revision suppression event produced to the revision_restrictions topic, then the topic name could be automatically added as a tag to the change event.

Producing resource URLs to "resource change" topic by default

For event producers, it would be convenient if events produced to specialized topics could also implicitly emit an event to the "resource changed" topic, containing the URL required in the event's meta block. Technically this should be relatively straightforward to implement in the EventBus proxy service. We will however need to make sure that events contain sensible URLs. We could consider only automatically forwarding specific specialized topics, to avoid broken URLs in the resource topic.

Event Timeline

GWicke raised the priority of this task from to High.
GWicke updated the task description. (Show Details)

@mobrovac, etcd has its own event stream / watch mechanism. Would the attraction of relaying such events to Kafka be in keeping them around for longer, allowing consumers to catch up later?

@mobrovac, etcd has its own event stream / watch mechanism. Would the attraction of relaying such events to Kafka be in keeping them around for longer, allowing consumers to catch up later?

Granted, pooling / depooling is not the best example, but the idea is that it might be simpler to have EventBus consumers than etcd watchers to follow changes and propagate them.

We have now implemented basic event emission along these lines in T109742. The emitted events only contain meta.uri. For now, no actual events are produced to Kafka. Instead, purges are processed locally in RESTBase.

Next steps:

  1. Set up a resource_change topic in EventBus.
  2. Produce to that from RESTBase and MediaWiki, possibly by automatically deriving from more specialized events.
  3. Set up a purge service to process all URLs in this topic, and stop processing purges in RESTBase itself.

Change 273916 had a related patch set uploaded (by Mobrovac):
Add the resource_changed event

Change 273916 merged by Mobrovac:
Add the resource_change event

Change 274785 had a related patch set uploaded (by Ottomata):
Add topic config for wmf.resource_change

Change 274785 merged by Ottomata:
Add topic config for wmf.resource_change

Pchelolo claimed this task.

The resource_change topic was implemented and enabled in production, so I'm closing this RFC