With the implementation if [[https://www.mediawiki.org/wiki/Multi-Content_Revisions|MCR]] progressing, questions have arisen regarding the desired behavior of link tracking with respect to the content of slots other than the main slot. //Link tracking// here mainly refers to information maintained by the LinksUpdate, including tables like `pagelinks`, `templatelinks`, `imagelinks`, `externallinks`, but also `page_props`, but the question extends to all information maintained by DataUpdate objects returned by `Content::getSecondaryDataUpdates`.
== Status quo ==
Such //link tracking// mainly serves two purposes:
* Detect when pages need to be re-rendered, when content they depend on change (e.g. templates).
* Find instances of content or references for removal (e.g. an image that was deleted, external links that have found to be spam).
Within MediaWiki PHP, the tracking information that is maintained via LinksUpdate, is represented in the form of ParserOutput objects. These are consumed by the skin (e.g. to display categories or language links), and by edit filters (e.g. AbuseFilter rules).
Code associations for storage:
* Without MCR: Title relates to Page, relates to the (current) Revision, which has (1) Content. (Via 1 row in `revision` by rev_id, with rev_text_id pointing to the item in text storage.)
* With MCR: Revision relates to (**one or more**) Content. (Via multiple rows in `slots` by slot_revision rev_id, where each entry has slot_revision_id pointing to 1 slot_content_id, with 1 content_address pointing to an item in text storage.)
Code associations for run-time access (currently, without MCR):
* WikiPage provides (1) ParserOutput (WikiPage::getParserOutput / PoolWorkArticleView::doWork).
* WikiPage internally gets ParserOutput by using the page's Revision to get (1) Content object.
* Then Content::getParserOutput invokes Parser with the raw text of the Content object (TextContent::fillParserOutput).
The subject of this RFC is how this will work when a revision has multiple Content objects associated (via slots).
== Side notes ==
The Services team is currently investigating new infrastructure for tracking the dependencies between generated artefacts and editable content in a more fine-grained way. That would allow us to de-couple the tracking mechanism for purging from the one for finding usages for administrative purposes. This option is however likely more than a year out.
Also note that at present, we have no way to track which slot uses a given resources. Adding that information to the links tables is conceptually simple, but is a lot of work for the DBAs, so it should only be done if actually needed.
== Questions ==
# Should the default behavior be to track only the resource usage of the main slot,(eg. when saving an edit) be to store references in link tables from only the main Content slot, or should references from extra (MCR) Content slots also be saved to link tables? requiring handler code forIn other slots to explicitly add tracking for their content?er words, Or should extension authors not havedoes a ContentHandler need to worry about thatenable tracking, and instead would have to make some effort to suppress suchor should it work by default for extension authors and instead have a way to disable tracking?
** Pros of tracking per defaultall slots: Meets expectations of site adminend-users (e.gg. can findfinding external links in all slots).via Whatlinkshere), Mand makes lifethings easiery for extension authors. !!Exposing e.g. a coordinate from a non-main slot "just works".!!
** Con tracking per default: Tracking may not be needed for purgs of tracking all slots: Not all slots affect rendering. If we track all slots that means changes to references from slots not used for rendering still end up purging the rendering. Suppressing the default behavior is harder than calling a utility functionopting in.
# If the content of an auxiliaryextra slot is not visible per default (in the standard /wiki/Foo(as in: does not affect default page view), should resource usage for ittheir links be tracked? It seems that, if we only track for purging, the answer should be "no". If we track to be able to find all uses (e.g. If we track to be able to find all usagesWhatlinkshere), then answer should be "yes". Since we track for both reasons, what should we do in the initial implementation of MCR do?
** Pros of tracking always: Allow all references to images, templates, pages, external links, etc to be found by site adminsend-users.
** Cons of tracking always: May purge the cached default view when things change that are not used in the default view (at least until we have more fine grained tracking).
Note that ifIf all usage is always tracked, regardless of how which slots are used by rendering, regardless of how which slot is usedthen !!this can be done in a completely generic way.!!
If tracking should, this can be done in a completely genericin some way., If however tracking in should some way depend on the slot (role) the content is in of the slot, then !!we'll need some kind of slot-role handler where the relevant code would like. It seems likely that we will need some kind of slot-role handler code anyway, e.g. for handling the purge action; we may also want the behavior of different slots to depend on the page type (file page, article page, template page, etc), but that is for another RFC.!!
== Proposal ==
Based on the discussion on 28 March 28 (summary at T190063#4091409), the following is proposed::
When running links updates (after an edit, etc)
# for each slot, construct a dedicated ParserOutput, and also a ParserOutput for the combined output.
# construct a ParserOutput for each slot, and a ParserOutput for the combined output
# merge the link tracking information forfrom all slots into thes's ParserOutput into a combined ParserOutput.
# run a LinksUpdate based onwith the combined output.
# run all other DataUpdates !!returned by the Content of all slots!!
Rationale: This approach preserves the maximum of information, and is easy to implement. The fact that it may lead to extraneous data tracking and spurious purging of the parser cache does not seem relevant in the light of the currently targeted use cases. This issue should be revisited in the context of the creation of an entirely new mechanism for tracking dependencies of generated artiartefacts for purging.
Relevant code experiment:
* https://gerrit.wikimedia.org/r/c/421794/6/includes/Render/RevisionRenderer.php#315 and below
* https://gerrit.wikimedia.org/r/c/405015/47/includes/Storage/PageMetaDataUpdater.php#1136
----
Further reading:
* Use cases https://www.mediawiki.org/wiki/Requests_for_comment/Multi-Content_Revisions#Use_Cases
* On-wiki discussion https://www.mediawiki.org/wiki/User:Daniel_Kinzler_(WMDE)/MCR-PO
----
This RFC is intended to resolve questions about the expected behavior of tracking meta-data in links tables (pagelinks, imagelinks, templatelinks, etc), to guide the architecture and initial implementation of MCR related code. This RFC is not intended to gain approval for a technical solution, but of requirements for such a solution.
Note that the mechanism for combining the HTML of multiple slots is beyond the scope of this RFC. The obvious approach is to let each slot decide how it presents itself in the standard "article" view. This allows slots to be freely combined. However, some central control of the layout may be desirable for well-known combinations of slots, e.g. for the integration of MediaInfo on file description page for the DSC project.