How can we get and use a stream of mediawiki events for facilitating incrementals of the project content dumps, and what would this stream contain?
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T128513 Dumps 2.0 Platform design questions | |||
Open | None | T128754 Can we use a stream of events for facilitating incrementals of the project content dumps? |
Event Timeline
Could this plug into the existing RCStream thingy? Or does that provide enough events to trigger small update/sync jobs?
The idea would be to use the shiny new Event-Platform system and listen to events there. Eventually we are hoping to use it for RCStream as well, but that's out of scope of this ticket :)
We definitely had a lot of discussion about eventbus use; the rc stream is not 100% reliable and the format is pretty clunky too.
I always forget, is content stored in MySQL? If so, maybe T120242 would help? Maybe not, since the content is crazy binary format?
@Ottomatta I might not understand your question properly. Page content (wikitext) is available in the external stores (mysql dbs), and we get it from there for the current adds/changes dumps. HTML with a lot of extra markup is stored in RestBASE and would be retrieved there for 'incremental' html dumps if those were produced.