Page MenuHomePhabricator

Can we use a stream of events for facilitating incrementals of the project content dumps?
Open, HighPublic

Description

How can we get and use a stream of mediawiki events for facilitating incrementals of the project content dumps, and what would this stream contain?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 3 2016, 5:31 PM
ArielGlenn triaged this task as Normal priority.Mar 3 2016, 5:31 PM
brion added a subscriber: brion.Mar 4 2016, 9:52 PM

Could this plug into the existing RCStream thingy? Or does that provide enough events to trigger small update/sync jobs?

ArielGlenn raised the priority of this task from Normal to High.Mar 7 2016, 6:10 PM

Could this plug into the existing RCStream thingy? Or does that provide enough events to trigger small update/sync jobs?

The idea would be to use the shiny new Event-Platform system and listen to events there. Eventually we are hoping to use it for RCStream as well, but that's out of scope of this ticket :)

We definitely had a lot of discussion about eventbus use; the rc stream is not 100% reliable and the format is pretty clunky too.

I always forget, is content stored in MySQL? If so, maybe T120242 would help? Maybe not, since the content is crazy binary format?

@Ottomatta I might not understand your question properly. Page content (wikitext) is available in the external stores (mysql dbs), and we get it from there for the current adds/changes dumps. HTML with a lot of extra markup is stored in RestBASE and would be retrieved there for 'incremental' html dumps if those were produced.