EventStreams is moving along, and we need to figure out what streams of events (other than recentchanges, this will be exposed for sure) should be exposed in the public API.(This is a parent/placeholder ticket for Q4 goals linking.)
I had previously just considered exposing as much as we canhttps://meta.wikimedia.org/wiki/Research:MediaWiki_events:_a_generalized_public_event_datasource is a proposal that expands on the feature set currently available from RCStream. We would like to generalize this beyond just mediawiki events, but there may be reasons to not to so (redundancy of data API endpoints is one of them)and build a service that can make arbitrary event streams of JSON events available for public consumption.
In Kafka now, we currently have available:A brainstorm meeting about this was held on March 15 2016. Notes from the meeting are here: https://etherpad.wikimedia.org/p/PublicEventBus
- page-moveTentative Plan:
- Build a service that exposes configured Kafka topics via websockets or http. Offset/timestamp historical consumption and field filtering TBD. This should at least be feature compatible with current RCStream (e.g. wiki filtering).
- page-delete- Expose public events currently available in Kafka via this service.
- page-undelete- Produce recent changes events to Kafka (possibly via EventBus service, but maybe not).
- page-properties-change- Serve recentchanges events from this service.
- resource-change- deprecate RCStream python/redis based service
As well as more. The schemas for these events are defined at https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema/mediawiki. Should we include all or some of these? Should we somehow compose these (via change-prop) into different event streams with different schemas altogether (e.g. an edit stream?).