Page MenuHomePhabricator

Can we use a stream of events for facilitating incrementals of the project content dumps?
Open, HighPublic

Description

How can we get and use a stream of mediawiki events for facilitating incrementals of the project content dumps, and what would this stream contain?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 3 2016, 5:31 PM
ArielGlenn triaged this task as Normal priority.Mar 3 2016, 5:31 PM
brion added a subscriber: brion.Mar 4 2016, 9:52 PM

Could this plug into the existing RCStream thingy? Or does that provide enough events to trigger small update/sync jobs?

ArielGlenn raised the priority of this task from Normal to High.Mar 7 2016, 6:10 PM

Could this plug into the existing RCStream thingy? Or does that provide enough events to trigger small update/sync jobs?

The idea would be to use the shiny new EventBus system and listen to events there. Eventually we are hoping to use it for RCStream as well, but that's out of scope of this ticket :)

We definitely had a lot of discussion about eventbus use; the rc stream is not 100% reliable and the format is pretty clunky too.

I always forget, is content stored in MySQL? If so, maybe T120242 would help? Maybe not, since the content is crazy binary format?

@Ottomatta I might not understand your question properly. Page content (wikitext) is available in the external stores (mysql dbs), and we get it from there for the current adds/changes dumps. HTML with a lot of extra markup is stored in RestBASE and would be retrieved there for 'incremental' html dumps if those were produced.