We need a reliable way to distribute a variety of update events emitted from MediaWiki core (and other services) to various consumers. Currently we use the job queue for this (ex: [Parsoid extension](https://github.com/wikimedia/mediawiki-extensions-Parsoid/blob/master/Parsoid.hooks.php)), but this is fairly complex, not very reliable and does not support multiple consumers without setting up separate job types.
We are looking for a solution that decouples producers from consumers, and gives us better reliability than the current job queue.
## Event type candidates
- Wikidata updates: summary of changes (ideally with details of the actual changes)
- use case: keeping the #wikidata-query-service up to date
- Page edits, moves and visibility changes (page / revision deletion / suppression); pretty much what is tracked in [the Parsoid extension](https://github.com/wikimedia/mediawiki-extensions-Parsoid/blob/817a7581f1ba554415128449b7a0a6a00248a443/Parsoid.hooks.php#L66)
- use case: keeping restbase content and caches up to date
## Requirements for an implementation
- persistent: state does not disappear on power failure & can support large delays (order of days) for individual consumers
- no single point of failure
- supports pub/sub consumers with varying speed
- ideally, lets various producers enqueue new events (not just MW core)
- example use case: restbase scheduling dependent updates for content variants after HTML was updated
## Option 1: Kafka
Kafka is a persistent and replicated queue with support for both pub/sub and job queue use cases. We already use it at high volume for request log queueing, so have operational experience and a working puppetization. This makes it a promising candidate.
Rough tasks for an implementation:
- Set up a kafka instance
- Figure out good producer & consumer interfaces
- could use raw kafka, but there might be a benefit in some abstraction: Could we use HTTP / websockets? See also: [RESTBase queueing notes](https://github.com/wikimedia/restbase-cassandra/blob/master/doc/QueueBucket.md)
- define events & relative order requirements
- hook up a synchronous producer to the relevant MediaWiki hooks
## Open questions
- Should we abstract over the raw queue interface? Inspiration: [Amazon](http://aws.amazon.com/sqs/), [Google](https://cloud.google.com/appengine/docs/java/taskqueue/), [Azure](http://azure.microsoft.com/en-us/documentation/services/service-bus/)
- Where / how should we expand link table jobs? A consumer of the primary event that enqueues individual updates to another queue?
- How can we scale this down for third-party users?
- Can we build on the existing job queue fall-back?