Page MenuHomePhabricator

Use a dedicated mechanism to track page dependencies
Open, Needs TriagePublic

Description

Summary
The content of some pages may depend on the content of other pages; for now lets call the former Source Pages and the latter Target Pages. MediaWiki parser needs to keep track of these, so that when there is a change to the Target Page, the cached output for Source Page could be invalidated and its output be re-cached in due time. The way MediaWiki currently does this is by reusing the tables that keep track of links (pagelinks, templatelinks, etc.) but this has caused issues. This task focuses on replacing that approach with a new one, in which a dedicated mechanism (including a new table) is used to keep track of page dependencies. MediaWiki should also expose this new mechanism to extensions, so they can impose additional dependencies beyond what MediaWiki core does.

Details
Let's start with two really straightforward examples of page dependencies:

  • If Source Page has a link to Target Page (i.e. [[Target_Page]] in its source), whether the link is shown as a blue or red link will depend on whether Target Page exists. If Target Page is deleted, links to it should be changed to red links; if it is recreated, they should be changed to blue links. MediaWiki uses pagelinks table to keep track of links between pages, and each time a page is created/deleted, it will find all other pages with backlinks and invalidates their cache.
  • If Source Page transcludes Target Page (i.e. {{Target_Page}} or {{Target_Page|...}} in its source), the content of Source Page will be based on the parsed output of the Target Page, given the parameters provided to it as a template. Here, not only deletion and creation of Target Page matters, but also its actual content matters; if its content is changed, all pages that transclude it would need to be parsed and cached again. MediaWiki uses the templatelinks table to identify all such pages.

Now let's talk about two use cases that don't fit this design:

  • If the Source Page contains the {{PAGESINCATEGORY:...}} magic word (which is part of MediaWiki core), and the number of pages in the target category changes, MediaWiki has no mechanism to notice this and trigger a cache invalidation on the Source Page. (See T221795 for related issues surrounding category counts)
  • If the Source Page uses the {{#categorytree:...}} function from MediaWiki-extensions-CategoryTree then its output will change as pages are added to or removed from some target category. This, again, is not something that is currently tracked by MediaWiki.
  • If the Source Page uses the {{#ifxists:...}} function from ParserFunctions to check the existence of a Target Page, obviously, the existence of the Target Page will have direct impact on the output of Source Page. What ParserFunctions does to keep track of this is that it overuses the pagelinks table by inserting a link from Source Page to Target Page, but this causes some undesirable side effects. (See T14019)

Without going to much into what the solution should be, it may be helpful to set the expectations. One could imagine a pagedependecies table that specifically tracks that the parsed output of some Source Page depends on the existence and/or output of some Target Page. This way, if ParserFunctions wants to introduce a dependency from Source Page to Target Page, it can add a row in pagedependencies without tainting pagelinks. Similarly, parser can check pagedependencies when parsing a page to see if the new output has changed which categories the page belongs to and this could impact what other pages whose content may rely on the contents of said categories.

Ultimately, this task could open the way for T56902: Deprecate and remove the purge action from MediaWiki

Considerations
It may be best to still use the *links tables as we do now, but additionally use a dedicate mechanism to keep track of dependencies that cannot be properly captured using the above mechanism. The advantage is that our pagedependecies table will not duplicate the data that already exists in the *links tables; the disadvantage is that it will fragment the dependency-tracking process even further.