With the implementation if [[https://www.mediawiki.org/wiki/Multi-Content_Revisions|MCR]] progressing, questions have arisen regarding the desired behavior of link tracking with respect to the content of slots other than the main slot. //Link tracking// here mainly refers to information maintained by the LinksUpdate, including tables like pagelinks, templatelinks, imagelinks, externallinks, but also page_props, but the question extends to all information maintained by DataUpdate objects returned by Content::getSecondaryDataUpdates.
== Status Quo ==
Such //link tracking// mainly serves two purposes:
* detecting when pages need to be re-rendered, because resources they depends on (e.g. templates) change.
* finding usages of resources that should no longer be used (e.g. images that are being deleted or external links that have found to be spam).
Beyond that, tracking information represented in ParserOutput objects (essentially, anything that later goes into LinksUpdate), can also be used by the skin (e.g. to show categories or interlanguage links) and by edit filters (e.g. AbuseFilter rules).
Note that we the Services team is currently investigating new infrastructure for tracking the dependencies between generated artifacts and editable resources in a more fine grained way. That would allow us to de-couple the tracking mechanism for purging from the one for finding usages for administrative purposes. This option is however likely more than a year out.
Also note that at present, we have no way to track which slot uses a given resources. Adding that information to the links tables is conceptually simple, but is a lot of work for the DBAs, so it should only be done if actually needed.
== Questions ==
The main questions that arose in this context are:
# Should the default behavior be to track only the resource usage of the main slot, requiring handler code for other slots to explicitly add tracking for their content? Or should extension authors not have to worry about that, and instead would have to make some effort to suppress such tracking?
** Pro tracking per default: Meet expectations of site admins (e.g. can find external links in all slots). Makes life easier for extension authors. Exposing e.g. a coordinate from a non-main slot "just works".
** Con tracking per default: Tracking may not be needed for purging. Suppressing default behavior is harder than calling a utility function.
# If the content of an auxiliary slot is not visible per default (in the standard /wiki/Foo view), should resource usage for it be tracked? It seems that, if we only track for puring, the answer should be "no". If we track to be able to find all usages, the answer should be "yes". Since we track for both, what should we do in the initial implementation of MCR?
** Pro tracking always: Allow all references to images, templates, pages, external links, etc to be found by site admins.
** Con tracking always: May purge the cached default view when things change that are not used in the default view (at least until we have more fine grained tracking).
Note that if all usage is always tracked, regardless of how which slot is used, this can be done in a completely generic way. If however tracking in should some way depend on the slot (role) the content is in, we'll need some kind of slot-role handler where the relevant code would like. It seems likely that we will need some kind of slot-role handler code anyway, e.g. for handling the purge action; we may also want the behavior of different slots to depend on the page type (file page, article page, template page, etc), but that is for another RFC.
== Proposal ==
Based on the discussion on March 28 (summary at T190063#4091409), the following is proposed:
When running links updates (after an edit, etc)
# construct a ParserOutput for each slot, and a ParserOutput for the combined output
# merge the link tracking information for all slots into the combined ParserOutput
# run a LinksUpdate based on the combined output
# run all other DataUpdates returned by the Content of all slots
Rationale: This approach preserves the maximum of information, and is easy to implement. The fact that it may lead to extraneous data tracking and spurious purging of the parser cache does not seem relevant in the light of the currently targeted use cases. This issue should be revisited in the context of the creation of an entirely new mechanism for tracking dependencies of generated artifacts for purging.
----
Work-in-progress code, for reference:
* https://gerrit.wikimedia.org/r/c/405015/47/includes/Storage/PageMetaDataUpdater.php#1196
* https://gerrit.wikimedia.org/r/c/405015/47/includes/Storage/PageMetaDataUpdater.php#1136
Use cases, for reference:Further reading:
* Use cases https://www.mediawiki.org/wiki/Requests_for_comment/Multi-Content_Revisions#Use_Cases
* On-wiki discussion https://www.mediawiki.org/wiki/User:Daniel_Kinzler_(WMDE)/MCR-PO
----
This RFC is intended to resolve questions about the expected behavior of tracking meta-data in links tables (pagelinks, imagelinks, templatelinks, etc), to guide the architecture and initial implementation of MCR related code. This RFC is not intended to gain approval for a technical solution, but of requirements for such a solution.
Note that the mechanism for combining the HTML of multiple slots is beyond the scope of this RFC. The obvious approach is to let each slot decide how it presents itself in the standard "article" view. This allows slots to be freely combined. However, some central control of the layout may be desirable for well-known combinations of slots, e.g. for the integration of MediaInfo on file description page for the DSC project.