With the implementation if [[https://www.mediawiki.org/wiki/Multi-Content_Revisions|MCR]] progressing, questions have arisen regarding the desired behavior of link tracking with respect to the content of slots other than the main slot. //Link tracking// here mainly refers to information maintained by the LinksUpdate, including tables like `pagelinks`, `templatelinks`, `imagelinks`, `externallinks`, but also `page_props`, but the question extends to all information maintained by DataUpdate objects returned by `Content::getSecondaryDataUpdates`.
== Status Qquo ==
Such //link tracking// mainly serves two purposes:
* d* Detecting when pages need to be re-rendered, because resourceswhen content they depends on (e on change (e.g. templates) change.
* finding usag* Find instances of resources that should no longer be used (econtent or references for removal (e.g. an image that was deleted, images that are being deleted or external links that have found to be spam).
Beyond thatWithin MediaWiki PHP, tthe tracking information represented in ParserOutput objects (essentiallythat is maintained via LinksUpdate, anything that later goes into LinksUpdate),is represented in the form of ParserOutput objects. can also be usedThese are consumed by the skin (e.g. to display categories or language links), to show categories or interlanguage links) and by edit filters (e.g. AbuseFilter rules).
Note that we the Services team is currently investigating new infrastructure for tracking the dependencies between generated artifacts and editable resources in a more fine grained way. That would allow us to de-couple the tracking mechanism for purging from the one for finding usages for administrative purposes. This option is however likely more than a year out.Code associations for storage:
Also note that at present* Without MCR: Title relates to Page, relates to the (current) Revision, which has (1) Content. (Via 1 row in `revision` by rev_id, we have no way to track which slot uses a given resourceswith rev_text_id pointing to the item in text storage.)
* With MCR: Revision relates to (**one or more**) Content. Adding that information to the links tables is conceptually simple(Via multiple rows in `slots` by slot_revision rev_id, but is a lot of work for the DBAswhere each entry has slot_revision_id pointing to 1 slot_content_id, so it should only be done if actually needed.with 1 content_address pointing to an item in text storage.)
== Questions ==Code associations for run-time access (currently, without MCR):
* WikiPage provides (1) ParserOutput (WikiPage::getParserOutput / PoolWorkArticleView::doWork).
The main questions that arose in this c* WikiPage internally gets ParserOutput by using the page's Revision to get (1) Context are:nt object.
* Then Content::getParserOutput invokes Parser with the raw text of the Content object (TextContent::fillParserOutput).
The subject of this RFC is how this will work when a revision has multiple Content objects associated (via slots).
== Side notes ==
The Services team is currently investigating new infrastructure for tracking the dependencies between generated artefacts and editable content in a more fine-grained way. That would allow us to de-couple the tracking mechanism for purging from the one for finding usages for administrative purposes. This option is however likely more than a year out.
Also note that at present, we have no way to track which slot uses a given resources. Adding that information to the links tables is conceptually simple, but is a lot of work for the DBAs, so it should only be done if actually needed.
== Questions ==
# Should the default behavior be to track only the resource usage of the main slot, requiring handler code for other slots to explicitly add tracking for their content? Or should extension authors not have to worry about that, and instead would have to make some effort to suppress such tracking?
** Pro tracking per default: Meet expectations of site admins (e.g. can find external links in all slots). Makes life easier for extension authors. Exposing e.g. a coordinate from a non-main slot "just works".
** Con tracking per default: Tracking may not be needed for purging. Suppressing default behavior is harder than calling a utility function.
# If the content of an auxiliary slot is not visible per default (in the standard /wiki/Foo view), should resource usage for it be tracked? It seems that, if we only track for puring, the answer should be "no". If we track to be able to find all usages, the answer should be "yes". Since we track for both, what should we do in the initial implementation of MCR?
** Pro tracking always: Allow all references to images, templates, pages, external links, etc to be found by site admins.
** Con tracking always: May purge the cached default view when things change that are not used in the default view (at least until we have more fine grained tracking).
Note that if all usage is always tracked, regardless of how which slot is used, this can be done in a completely generic way. If however tracking in should some way depend on the slot (role) the content is in, we'll need some kind of slot-role handler where the relevant code would like. It seems likely that we will need some kind of slot-role handler code anyway, e.g. for handling the purge action; we may also want the behavior of different slots to depend on the page type (file page, article page, template page, etc), but that is for another RFC.
== Proposal ==
Based on the discussion on March 28 (summary at T190063#4091409), the following is proposed:
When running links updates (after an edit, etc)
# construct a ParserOutput for each slot, and a ParserOutput for the combined output
# merge the link tracking information for all slots into the combined ParserOutput
# run a LinksUpdate based on the combined output
# run all other DataUpdates returned by the Content of all slots
Rationale: This approach preserves the maximum of information, and is easy to implement. The fact that it may lead to extraneous data tracking and spurious purging of the parser cache does not seem relevant in the light of the currently targeted use cases. This issue should be revisited in the context of the creation of an entirely new mechanism for tracking dependencies of generated artifacts for purging.
Relevant code experiment:
* https://gerrit.wikimedia.org/r/c/421794/6/includes/Render/RevisionRenderer.php#315 and below
* https://gerrit.wikimedia.org/r/c/405015/47/includes/Storage/PageMetaDataUpdater.php#1136
----
Further reading:
* Use cases https://www.mediawiki.org/wiki/Requests_for_comment/Multi-Content_Revisions#Use_Cases
* On-wiki discussion https://www.mediawiki.org/wiki/User:Daniel_Kinzler_(WMDE)/MCR-PO
----
This RFC is intended to resolve questions about the expected behavior of tracking meta-data in links tables (pagelinks, imagelinks, templatelinks, etc), to guide the architecture and initial implementation of MCR related code. This RFC is not intended to gain approval for a technical solution, but of requirements for such a solution.
Note that the mechanism for combining the HTML of multiple slots is beyond the scope of this RFC. The obvious approach is to let each slot decide how it presents itself in the standard "article" view. This allows slots to be freely combined. However, some central control of the layout may be desirable for well-known combinations of slots, e.g. for the integration of MediaInfo on file description page for the DSC project.