Page MenuHomePhabricator

ifexist function uses pagelinks table in lieu of better options
Open, LowPublic

Description

Description from merged task:
[[Special:Wantedpages]] show many links to pages "requested" via expressions like "{{#ifexist:}}". Query the existence of a page doesn't make it "wanted", sometimes its the exact opposite; we use this expression for filtering pages we DON'T WANT.

This affects the experience of editors that uses Special:Wantedpages to find articles or pages to work, and finds this special page full of noise.

E.g.
https://en.wikipedia.org/wiki/Special:WantedPages
https://fr.wikipedia.org/wiki/Spécial:Pages_demandées


Original (2007) description
i've traced it down to the line $parser->mOutput->addLink( $title, $id ); that was added by tstarling in revision 19892 on Mon Feb 12 10:24:23 2007 UTC with the reason "Register a link on #ifexist. Otherwise this breaks cache coherency.."

i can find no logical reasoning for this change. all it is doing is checking if the target exists, and outputting one of two user supplied text blocks. and that is all it should do. it is not making a link to target, nor does it display a link to target anywhere in the scope of this functions code so why does the target need to be added to the link list?

granted, i do not have a complete grasp of the internals of the parser nor the cache systems, but the feedback noise on special:whatlinkshere renders the page useless.


See also:
T18584: prop=links not include links from transcluded templates
T17735: Non-clickable links (from #ifexist) should be flagged in pagelinks
T12857: #ifexist: produces an entry in links list
T33628: When #ifexists target page is created/deleted, does not update links tables
T73637: mw.title.exists "pollutes" pagelinks table

This card tracks a proposal from the 2015 Community Wishlist Survey: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey
This proposal received 11 support votes, and was ranked #62 out of 107 proposals. https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Miscellaneous#Error_categorization_by_.23ifexist_bug


Detailed explanation
This detailed explanation was prepared by @Huji in 2020 in the hopes that it would increase the likelihood of this (and T33628) being fixed.

When {{#ifexists:TargetPage|...|...}} is called, the parser adds a link from the Source Page to the Target Page on the pagelinks table. This is because if the status of the Target Page changes from nonexistent (i.e. red link) to existing (i.e. blue link) or vice versa, MW parser needs to have a way to know which pages' cache needs to be purged. The way parser finds pages that need to be purged is by crawling the pagelinks table for all pages linking to the Target Page, and invalidating their cache. This has a few side effects.

Side Effect #1: even though the #ifexists command above doesn't really create a hyperlink from Source Page to Target Page, the pagelinks table thinks that such a link exists; this "fake" link will be reflected on on Special:WhatLinksHere/Target_Page or Special:WantedPages which is undesirable.

Side Effect #2: because parser always uses the pagelinks table in the above process, when Target Page is a file or a category, the data about this "fake" link is actually stored in the wrong table (pagelinks as opposed to imagelinks or categorylinks). Now, that is not all bad; if the right type of link was being created, we would see something similar to Side Effect #1 occurring with even more places (e.g. a category would not only list all pages in it, but also, all pages that check the existence of the category). But, when only one table is used to keep track of #ifexists calls, T33628 happens which is *undesirable*.

Side Effect #3: because Special:WhatLinksHere/Target_Page shows a list of pages that check the existence of Target Page, it allows for tracking all pages that may check the existence of a particular page, e.g. through a template. This is helpful, for example, when you are editing a template; in preview mode, you can see a list of outgoing links from the template, and that would include a link to Target Page (which the template only checks the existence of), and if that is, say, a missing template subpage, you will see a red link and realize that it is missing. This effect is *desirable*.

Ideally, we want to do away with the undesirable effects, while maintaining the desirable one.

Details

Reference
bz12019

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • Bug 71636 has been marked as a duplicate of this bug. ***
  • Bug 71637 has been marked as a duplicate of this bug. ***

beachtyper wrote:

This bug is really annoying if your userbase uses Special:WantedPages to guide their editing and you've got templates which liberally use #ifexist. Has any progress been made in resolving it?

This point is worth reiterating. The end user experience problem here is that applying this function to a page that doesn't exist causes it to show up in WantedPages even though that is (arguably) not the expected behavior. Asking whether a page exists doesn't make it wanted.

IMPORTANT: If you are a community developer interested in working on this task: The Wikimedia Hackathon 2016 (Jerusalem, March 31 - April 3) focuses on #Community-Wishlist-Survey projects. There is some budget for sponsoring volunteer developers. THE DEADLINE TO REQUEST TRAVEL SPONSORSHIP IS TODAY, JANUARY 21. Exceptions can be made for developers focusing on Community Wishlist projects until the end of Sunday 24, but not beyond. If you or someone you know is interested, please REGISTER NOW.
Quiddity removed a subscriber: wikibugs-l-list.
Quiddity added a subscriber: Quiddity.

Comment from merged task (just in case it's helpful):

The #ifexists parser tag is provided by the ParserFunctions extension.

However, that the link is recognized, even if the title does not exist, was an explicit change back in 2008. I'm not sure why, anymore (as the commit message doesn't explain why :P), but I'm not sure, if this is the _expected_ behaviour :) Adding tstarling, maybe he still remembers :)

Would like to notify you that there are users on Russian Wikipedia who are disturbed by this phenomenon, as there are modules/templates which automatically check existense of various pages, and instruments for checking unexistent linked pages fail when they come across pages that are checked by #ifexist (details in Russian).

I filed a new report about this thinking it was a recent problem - I didn't realise it dated back to 2007! The new report I posted was:

#ifexist (and Lua equivalents) are very useful tools, particularly for automatically-created infoboxes from Wikidata when checking to see if useful redirects to articles exist (e.g. to improve how location links are displayed).
However, it has an unexpected side-effect: if you check a page exists, it also gets included in the 'WhatLinksHere' special page for that page. This seems to be deliberate, as it's documented at https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions#.23ifexist - however there is no explanation there of why this is.
It is a problem as there are people on enwp that are looking for links to disambig pages to fix them - and they use WhatLinksHere to do so. Automated checking of the existence of pages (even using Lua to look for redirects only, e.g. see https://en.wikipedia.org/w/index.php?title=Module:Citeq&diff=prev&oldid=802553277 ) can easily cause thousands of such links to appear, which seriously disrupts the dab fixers' workflow.
A solution here would to be to only include the page in WhatLinksHere if a link is actually made to that page as a result of the #ifexist (and Lua equivalent) check actually results in a link being made to a dab page. Is that an easy change to make, or are there reasons for not making it?
For cases where this has caused issues, please see:

Any chance of fixing this soon? (!)

So yeah, this finished far back in the pack in the 2015 Community Wishlist Survey. You could bring it back up for the 2017 survey, which is currently in progress. The chances it will finish in the top ten are slim to none, and we're lucky if the folks working on this stuff can even get ten of these knocked off in a full year. So don't hold your breath.

The community survey will result in more OPTIONAL work being done on things that are not REALLY necessary that fix NOTHING that is actually BROKEN.

Complaining that no one has worked on this bug doesn't do anything to help resolve it. Developer time is limited, and dominated by fixing more critical bugs than this and developing new functionality. Compared to that, this is a quibble with a feature that already works, which produces editor annoyance but doesn't actually block any critical wikiwork. Rather than complaining into the void, maybe you should read the above discussion and related bug reports and contribute something meaningful to the discussion (or even better, submit some code)?

I've added a proposal to fix this to the 2017 community wish list - see https://meta.wikimedia.org/wiki/2017_Community_Wishlist_Survey/Miscellaneous/Stop_ifexist_checks_from_appearing_in_Special:WhatLinksHere . Comments/feedback/description edits are welcome there!

The logs below will need to be accounted for in T33628:

  • User (talk | contribs) created page A — A becomes a blue link
  • User (talk | contribs) imported A by file upload (# revisions) (or User (talk | contribs) imported A from (interwiki prefix):title (# revisions)) — A becomes a blue link
  • User (talk | contribs) restored page A (# revisions) — A becomes a blue link
  • User (talk | contribs) moved page A to B (or User (talk | contribs) moved page A to B without leaving a redirect) — B becomes a blue link
  • User (talk | contribs) deleted page A — A becomes a red link
  • User (talk | contribs) moved page A to B without leaving a redirect (or User (talk | contribs) moved page A to B over a redirect without leaving a redirect) — A becomes a red link

surely this can be fixed after 10 years? :)

I'm a complainer, not a hacker :D

I expanded the task definition so that it is clearer what the problem is, how does it manifest itself (including 2 undesirable and 1 desirable side effects), and what we expect of the potential solution to accomplish.

One radical idea is to modify the pagelinks table to have a column named linktype with two possible values: explicit meaning there actually is a blue/red link shown from Source Page to Target page using an <a> tag, or implicit meaning parser is "reserving" a link from Source Page to Target Page, as is the case with #ifexists parser function. Obviously, if both of these exist for a page, the explicit one will supersede the implicit one. This way, parser will have a way to keep track of all "fake" page links it is creating for future cache invalidation purposes, but special pages like WhatLinksHere or WantedPages can be restricted to only explicit links.

I describe it as a "radical" idea because pagelinks is a massive table so altering it is no joke. Also, we would need to create additional indexes (for the sake of special pages that only care about explicit links and would need to have an additional WHERE clause in their queries). This idea would certainly require DBA review and approval.

Since {{#ifexist: }} is provided by Extension:ParserFunctions and not by any core functionality, I don't think modifying a core table like the previous comment suggests would be the right approach.

Instead, I think it would be better to have a separate table to hold this information, with a similar if not the same structure of the pagelinks table. That way it will remain independent from core, and wouldn't require modifying the structure of the pagelinks table, that seems complicated on WMF production.

I like your thought, but I don't completely agree. Even though ParserFunctions is a non-core feature, I think the root cause of this problem is in MW core; specifically, the root cause is that mediawiki is using one table (pagelinks) for two purposes: its primary purpose, which is to keep track of links from one page to another, and a secondary purpose, which is to invalidate caches of pages when the status of a linked page changes.

In a completely different design, MediaWiki could have used a pagelinks table only for the first purpose, and a pagedependecies table for the second purpose. In fact, the pagelinks table was created by merging the links and brokenlinks table way back in version 1.5 and its intended use (as well as that of those ancestral tables) was primarily to keep track of internal links.

Therefore, while I agree you in principle that core MW table schema should not change because of an extension, I think the real issue here is how MW handles cache invalidation and enhancing that would help us solve the problems manifested by the ParserFunctions extension.

A weaker argument exists against your suggest too: that ParserFunctions should be merged into core :) After all, since at least seven years ago, there has been a request to T46429: Please enable ParserFunctions in new installs of MediaWiki by default

Such a feature would also be helpful for core and should be usable by the extensions.

The feature needs to take an database table to hold the connection/dependency between two pages for caching and refresh/reparse purpose.

There could be a type like #ifexists which is used on page creation/restore and deletion
#ifexists with Media: is using the imagelinks table and could be replaced as well with a type for upload and file deletion.

the #pagesincategory parser function could use this with a type for category membership changes to be more accurate on the used pages

the #revisiontimestamp and friends parser function using the templatelinks to get reparsed and could use this feature as well on page creation/restore, page deletion, page move and page edits/undos/rollback

Indeed, T221795 may also benefit from it. Category counts have at times become inaccurate (see T224321) and the blame has been put on issues with cache invalidation. Having a distinctive mechanism for tracking page dependency and cache invalidation could help resolve that issue.

I am starting to wonder if we should create a new task of the Epic kind, maybe titled as "Refactor the page dependency and cache invalidation process". This task could be a subtask of it, as could T221795 be.

A solution was implemented on English Wikipedia that removes undesirable Side Effect #1 but apparently also removes the desirable side effect(s) as well.

Along with making category counts more reliable would the Epic fix also solve issues with pages being reported as transcluded when they are not? For example I patrol pages that transclude "Template:Error" in the Talk: namespace. This category grows significantly over time as many false-positive transclusions are added to it, and I mostly clear it by running a script that null edits all the reported pages. After the null-edit run finishes, just a handful of true {{error}} transclusions are left behind, to be fixed by gnomes like me who patrol for errors. I just did this the other day, which is why the number of pages found is small at the moment.

Would it be helpful to create a wishlist proposal for the Epic task? Though asking end-users to vote on that would be kind of like asking mobile users to vote for 5G. Heck, flip phone users still don't know why they need 4G other than they need to upgrade to phones that use 4G to make their VoLTE calls when the carriers sunset their 3G networks to free up spectrum for 5G. I have no clue as to what 5G will do for me that 4G can't, despite all the TV ads promoting 5G, so I would not be likely to vote for a task to implement 5G.

Indeed, T221795 may also benefit from it. Category counts have at times become inaccurate (see T224321) and the blame has been put on issues with cache invalidation. Having a distinctive mechanism for tracking page dependency and cache invalidation could help resolve that issue.

I do not see how T221795 or T224321 are affected by this. My sentence about "more accurate" in context of category membership is about the number #pagesincategory shows on the page using that parser function. Currently a addition to a category does not refresh the page using the number from the category with #pagesincategory
The bugs mention sounds like other issues where the addition is not done or the number shows on the category pages is strange (that would affect #pagesincategory as well, but it is not this bug).

A solution was implemented on English Wikipedia that removes undesirable Side Effect #1 but apparently also removes the desirable side effect(s) as well.

Along with making category counts more reliable would the Epic fix also solve issues with pages being reported as transcluded when they are not? For example I patrol pages that transclude "Template:Error" in the Talk: namespace. This category grows significantly over time as many false-positive transclusions are added to it, and I mostly clear it by running a script that null edits all the reported pages. After the null-edit run finishes, just a handful of true {{error}} transclusions are left behind, to be fixed by gnomes like me who patrol for errors. I just did this the other day, which is why the number of pages found is small at the moment.

Would it be helpful to create a wishlist proposal for the Epic task? Though asking end-users to vote on that would be kind of like asking mobile users to vote for 5G. Heck, flip phone users still don't know why they need 4G other than they need to upgrade to phones that use 4G to make their VoLTE calls when the carriers sunset their 3G networks to free up spectrum for 5G. I have no clue as to what 5G will do for me that 4G can't, despite all the TV ads promoting 5G, so I would not be likely to vote for a task to implement 5G.

The use of PROTECTIONEXPIRY could be fixed with a solution by having an type for page protection and unprotection.
This could also be expanded to GENDER to have it corrected when user settings are changed, but that would be for a long term solution. The most benefit is for #ifexists

The problem with the "transcluded when they are not" is more a side effect of the used template/PROTECTIONEXPIRY.

Indeed, T221795 may also benefit from it. Category counts have at times become inaccurate (see T224321) and the blame has been put on issues with cache invalidation. Having a distinctive mechanism for tracking page dependency and cache invalidation could help resolve that issue.

I do not see how T221795 or T224321 are affected by this. My sentence about "more accurate" in context of category membership is about the number #pagesincategory shows on the page using that parser function. Currently a addition to a category does not refresh the page using the number from the category with #pagesincategory
The bugs mention sounds like other issues where the addition is not done or the number shows on the category pages is strange (that would affect #pagesincategory as well, but it is not this bug).

I want to emphasize your point here. Because parser is using pagelinks table for a secondary purpose (of dependency tracking and cache invalidating) and because {{PAGESINCATEGORY:...}} has no way to record the dependency it introduces to the members of a category, Parser will have no way to know that once a new page is added to a given category, pages that reference it in {{PAGESINCATEGORY:...}} are now invalid and have to be re-cached.

You might argue that it would be undesirable to re-cache pages like this; the PAGESINCATEGORY magic word is already pretty expensive and adding a cache-purge aspect to it might make things worse. But that is besides the point. The point here is that by not having a clean, dedicated mechanism to track page dependencies, MediaWiki core is giving users inconsistent behavior on when pages whose content depend on other pages may or may not be re-cached.

Unlike the {{#ifexists:...}} example that this task is about, PAGESINCATEGORY is not from an extension; it is from MW core.

I think it is time for us to create the Epic task. I will start it shortly.

Some implementation options:

Option 1

  • Just stop putting #ifexist in pagelinks.
  • When existence changes, no cache update is triggered. The page would show the old existence state until the parser cache expires (currently 21 days).
  • There would be no UI for reverse search.
  • Migration is optional, we could just wait for normal refreshLinks updates. Pages would disappear from Special:WhatLinksHere as they are edited or refreshed by the job queue. Or we could write a script to replace the old refreshLinks.php which can work at the required scale.

Option 2

  • Add a table for #ifexist.
  • When existence changes, check both pagelinks and the new table.
  • A #ifexist existence change could trigger refreshLinks, not just htmlCacheUpdate, so that links in the new fragment would be registered. That seems like a useful new feature.
  • Special:WhatLinksHere could search for pages linking with #ifexist if desired. Special:WhatLinksHere is implemented as an emulated union across three tables (pagelinks, templatelinks and imagelinks). Providing a search feature would mean adding a fourth table here. But note that this is not requested in the task description.
  • Migration is the same as option 1.

Option 3

  • Add a field to the pagelinks table, say pl_type.
  • Move the #ifexist feature from ParserFunctions to core since pagelinks is a core table.
  • The current primary key is pl_from/pl_namespace/pl_title which would collide if both an #ifexist link and a normal link were present on a page. Either the new field would be added to the primary key, or we would pick a winner and #ifexist information would be lost in this situation.
  • The new field would have to be added as the first key of the two reverse indexes in order to allow efficient filtering in Special:WhatLinksHere.
  • When page existence changes, to efficiently find both links and #ifexist references, the query would have select both sections of the index, pl_type IN(0,1).
  • Migration could be done by defaulting the type to 0. Then run or wait for refreshLinks as in the other two options.
  • Altering the pagelinks table is a large DBA project which would take months. It would presumably be combined with T300222.

I prefer option 2.

For Option 2, core also has to change such that Special:WhatLinksHere can be extended to search a fourth table. A hook need to be defined in SpecialWhatLinksHere:showIndirectLinks() to allow the query conditions to be manipulated, and I am guessing another hook needs to be defined later on which would allow the Extension to decide how results are shown (or can we use the existing hook in listItem()?)

It feels weird to do all that just for the sake of one specific use case (as opposed to bringing that use case to core, i.e. option #3 above). But I agree that option 2 makes most sense, mainly because the implementation of #3 seems prohibitive, and arguments could be made for why ParserFunctions should not be core.

Option 1 or 2 sound good to me but I think Option 3 is undesirable from database point of view. Because pagelinks table is tall (1.5B rows in enwiki) and adding an extra column (with accompanying index(es)) would add a lot of data specially for a usecase that's not so common. This reminds me of wacthlist expiry case in which we went with a dedicated table as expiring watchlist entries are less common.

With option 2, if you need to list that in a special page (e.g. have a dedicated option to combine both in whatlinkshere) you might run into pagination complexities but nothing unsolvable.

I don't think "indirect link" would be a good name, indirect can refer to ... indirect links (Page A -> Page B -> Page C and C being a indirect link of A), soft link is a better name if you ask me.

I think option 2 would be ideal, and the table and change to Special:WhatLinksHere UI could be done in core, since other use cases may benefit. See T14019#6638446. That would reduce the complexity to integrate ParserFunctions with it.