Both T360794: Implement stream of HTML content on mw.page_change event and T331399: Create new mediawiki links change streams based on fragment/mediawiki/state/change/page are about emitting event data derived from the HTML parsed version of MediaWiki page revisions.
Other projects also use events to represent data that is either (or should be) derived from the page revision HTML:
- T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task
- T392833: Q1 FY2025-26 Goal: Make article topic data available at scale and within SLOs for Year in Review
- Image suggestions
- Add a link
etc.
The output data of all of these depends (directly or indirectly) on the MediaWiki parsed HTML. The parsed HTML (and anything derived from it) can change due to things other than edits; Template or transclusion changes, time passing, different parser versions, etc. etc.
E.g., A page's topic prediction might change because a template dependency was edited.
Propagating all changes due to reparsing is out of scope for current externalized (outside of MediaWiki) derived data projects. However, while we may not need to update externally stored parsed HTML derived data for MVPs, getting the data model right now will be important for when we do.
We primarily need a model for a reusable stable identifier for a specific page revision rendering.
This task should follow the precedent set by T308017: Design Schema for page state and page state with content (enriched) streams. Data Engineering and MediaWiki engineers should collaborate on designing a good data model and event JSONSchema fragment that can represent MediaWiki's concept of a 'rendering' with a render_id.
Done is:
- reusable event JSONSchema fragment for MediaWiki renderings designed and committed to schemas-event-primary.