ifexist function uses pagelinks table in lieu of better options
Open, LowPublic

Description

Author: c.stafford

Description:
this problem still exists as of today (2007-11-18).

i've traced it down to the line $parser->mOutput->addLink( $title, $id ); that was added by tstarling in revision 19892 on Mon Feb 12 10:24:23 2007 UTC with the reason "Register a link on #ifexist. Otherwise this breaks cache coherency.."

i can find no logical reasoning for this change. all it is doing is checking if the target exists, and outputting one of two user supplied text blocks. and that is all it should do. it is not making a link to target, nor does it display a link to target anywhere in the scope of this functions code so why does the target need to be added to the link list?

granted, i do not have a complete grasp of the internals of the parser nor the cache systems, but the feedback noise on special:whatlinkshere renders the page useless.


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=16584
https://bugzilla.wikimedia.org/show_bug.cgi?id=15735
https://bugzilla.wikimedia.org/show_bug.cgi?id=10857
https://bugzilla.wikimedia.org/show_bug.cgi?id=31628
https://bugzilla.wikimedia.org/show_bug.cgi?id=71637

This card tracks a proposal from the 2015 Community Wishlist Survey: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey

This proposal received 11 support votes, and was ranked #62 out of 107 proposals. https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Miscellaneous#Error_categorization_by_.23ifexist_bug

Details

Security
None
Reference
bz12019
bzimport set Reference to bz12019.
bzimport added a subscriber: Unknown Object (MLST).

david.sledge wrote:

According to the source code and comments, when the static class method "Title::newFromText()" creates a new Title object, that object is added to a private map so that if multiple links to the same page exist, it doesn't have to go through the process of creating another Title object. It can just retrieve one from the cache, which saves a bit of time.

I'm guessing that every Title object in the map must also be in the underlying collection accessed by the method "$parser->mOutput->addLink()" otherwise there's processing/page-rendering issues.

The best solution of which I can think is to create a new static method in the Title class whose only purpose is to check whether or not a page exists based on a given text, and have this check done without fiddling with the underlying Title map or creating a Title object:

Title::exists($text, $defaultNamespace = NS_MAIN)

(Such a method may already exist. If so, then parser extension needs to be changed so that said method is called instead of "Title::newFromText().")

(In reply to comment #0)

i've traced it down to the line $parser->mOutput->addLink( $title, $id ); that
was added by tstarling in revision 19892 on Mon Feb 12 10:24:23 2007 UTC with
the reason "Register a link on #ifexist. Otherwise this breaks cache
coherency.."

i can find no logical reasoning for this change. all it is doing is checking if
the target exists, and outputting one of two user supplied text blocks. and
that is all it should do. it is not making a link to target, nor does it
display a link to target anywhere in the scope of this functions code so why
does the target need to be added to the link list?

I guess it's there so that the page containing the #ifexist check can be purged when the target page gets created.

See bug 10857.

I've generalized the summary for this bug following the resolution of bug 15731.

#ifexist can generate links like [[Category:Foobar]] but it places the rows in the pagelinks table instead of using of a more appropriate table (in this case it would be the categorylinks table).

This pollutes the results of cross namespace link searches and other tools.

ayg wrote:

Comment 2 is correct. It needs to go in some links table or other so the page can be purged if necessary. I'm pretty sure that the Title cache referred to in comment 1 is irrelevant. I guess we could create an ifexistlinks table, but that seems pretty stupid. Adding entries to pagelinks seems as correct to me as anything.

(In reply to comment #4)

#ifexist can generate links like [[Category:Foobar]] but it places the rows in
the pagelinks table instead of using of a more appropriate table (in this case
it would be the categorylinks table).

Why is the categorylinks table more appropriate? That would be completely broken, it would show up on category pages just for existence checking. Likewise templatelinks would cause it to show up in "templates used on this page". pagelinks is the generic one and should be used for random stuff like this.

This pollutes the results of cross namespace link searches and other tools.

Putting it in the categorylinks table would effectively categorize the page, which is much less correct. Putting it in pagelinks doesn't disrupt anything in the software proper, except arguably WhatLinksHere (but only arguably: it does kind of link there). If people are assuming that pagelinks entries correspond to actual clickable links on the page, yes, this breaks that assumption.

We're both talking about the same thing, right? {{#ifexist:Category:Foobar|...}} adds a pagelinks entry for [[Category:Foobar]]. {{#ifexist:Somerandompage|[[Category:Foobar]]}} adds a pagelinks entry for [[Somerandompage]], and (maybe) a categorylinks entry for [[Category:Foobar]]. Yes?

c.stafford wrote:

We're both talking about the same thing, right?
{{#ifexist:Category:Foobar|...}} adds a pagelinks entry for [[Category:Foobar]].

yes

->exists() on an Article causes this (i assume this is what the #ifexist parser function does too, have not looked)

herd wrote:

*** Bug 15735 has been marked as a duplicate of this bug. ***

Punting this to the new parser Brion has under development.

This has little to do with the parser. It's about the way that information is stored.
I suppose we could add an ENUM to pagelinks to state if it's a link check or a ifexists, but pagelinks does seem the right table .

beachtyper wrote:

This bug is really annoying if your userbase uses Special:WantedPages to guide their editing and you've got templates which liberally use #ifexist. Has any progress been made in resolving it?

Adding a pl_status column or similar to the pagelinks table seems like the path forward for this bug.

I thought alterations to Wikimedia wikis' pagelinks tables were off-limits due to their size, but Tim assures me that after recent alterations to the revision and categorylinks tables, alterations to pagelinks are doable.

I'm wondering, are there any issues with just merging all the link tables? It would allow extensions to use their own custom link types rather than having this issue.

Do not try to merge all link types to fix this bug. This is really a subtype of pagelink instead of just another link type like templatelinks. The previous idea to just add a column to pagelinks is better.

Not sure if other link types also share similarities regaring parser cache invalidation, life cycle, etc. But trying that would make this change much more bigger, complicated and harder to validate. Also such a schema change would not be forward compatible, which means a maintenance window during which no one could edit.

(In reply to Jan Zerebecki from comment #13)

Do not try to merge all link types to fix this bug. This is really a subtype
of pagelink instead of just another link type like templatelinks. The
previous idea to just add a column to pagelinks is better.

Well it wouldn't be just to fix this bug. I'm sure there are other extensions that would like to record their own types of links.

Also such a schema change
would not be forward compatible, which means a maintenance window during
which no one could edit.

This is probably not true. I'm sure it's possible to make such a schema change without interrupting availability. I'm just speculating, but assuming that statement-based replication is used, you can bring down a slave server, combine the tables into one, set up views as replacements for the deleted tables, restart the slave, and when it is up to date, make the slave the new master and replicate the schema changes across the remaining servers in the cluster.

Yes you are right, there are ways to do that without interrupting availability. Another possible one is to first create the new table, then change the code to deal with both tables but move data to the new table and later remove the understanding of the old empty tables. However this direction is IMHO more complicated than necessary.

For the new ifexist pagelink type to only be known by the extension, everything that uses pagelinks needs to call back into the extension. Do you know the complete list?

Well just to be clear, if we're going to be making any schema changes, we are not just going to randomly add a link type field to the pagelinks table, primarily because core has no use for such field. That was the reason I suggested the combined table approach.

If you want to solve this bug without a schema change, your only other option is to make a separate extension table for ParserFunctions.

This wouldn't be randomly adding a link type field. It would be adding support for pagelinks that do not count as a wanted page to core, even though core would not create such pagelinks on itself.

Creating an extension table for these links poses the same question: Which functionality (core or in Extensions) that currently obtains its information (possibly indirectly) from the pagelinks table would then need hooks in core to call into this new extension table?

  • Bug 71636 has been marked as a duplicate of this bug. ***
Anomie added a comment.Oct 5 2014, 8:07 PM
  • Bug 71637 has been marked as a duplicate of this bug. ***
Man77 added a subscriber: Man77.Mar 23 2015, 10:00 PM

beachtyper wrote:

This bug is really annoying if your userbase uses Special:WantedPages to guide their editing and you've got templates which liberally use #ifexist. Has any progress been made in resolving it?

This point is worth reiterating. The end user experience problem here is that applying this function to a page that doesn't exist causes it to show up in WantedPages even though that is (arguably) not the expected behavior. Asking whether a page exists doesn't make it wanted.

Qgil removed a subscriber: Qgil.Oct 14 2015, 7:49 AM
DannyH set Security to None.
IMPORTANT: If you are a community developer interested in working on this task: The Wikimedia Hackathon 2016 (Jerusalem, March 31 - April 3) focuses on Community-Wishlist-Survey projects. There is some budget for sponsoring volunteer developers. THE DEADLINE TO REQUEST TRAVEL SPONSORSHIP IS TODAY, JANUARY 21. Exceptions can be made for developers focusing on Community Wishlist projects until the end of Sunday 24, but not beyond. If you or someone you know is interested, please REGISTER NOW.
Restricted Application added a subscriber: JEumerus. · View Herald TranscriptJan 21 2016, 2:53 PM
DannyH edited the task description. (Show Details)Feb 6 2016, 12:14 AM
Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptTue, Apr 19, 7:43 PM

Add Comment