ifexist function uses pagelinks table in lieu of better options
OpenPublic

Description

Author: c.stafford

Description:
this problem still exists as of today (2007-11-18).

i've traced it down to the line $parser->mOutput->addLink( $title, $id ); that was added by tstarling in revision 19892 on Mon Feb 12 10:24:23 2007 UTC with the reason "Register a link on #ifexist. Otherwise this breaks cache coherency.."

i can find no logical reasoning for this change. all it is doing is checking if the target exists, and outputting one of two user supplied text blocks. and that is all it should do. it is not making a link to target, nor does it display a link to target anywhere in the scope of this functions code so why does the target need to be added to the link list?

granted, i do not have a complete grasp of the internals of the parser nor the cache systems, but the feedback noise on special:whatlinkshere renders the page useless.


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=16584
https://bugzilla.wikimedia.org/show_bug.cgi?id=15735
https://bugzilla.wikimedia.org/show_bug.cgi?id=10857
https://bugzilla.wikimedia.org/show_bug.cgi?id=31628
https://bugzilla.wikimedia.org/show_bug.cgi?id=71637

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz12019.
bzimport created this task.Via LegacyNov 18 2007, 10:13 AM
bzimport added a comment.Via ConduitNov 20 2007, 6:29 PM

david.sledge wrote:

According to the source code and comments, when the static class method "Title::newFromText()" creates a new Title object, that object is added to a private map so that if multiple links to the same page exist, it doesn't have to go through the process of creating another Title object. It can just retrieve one from the cache, which saves a bit of time.

I'm guessing that every Title object in the map must also be in the underlying collection accessed by the method "$parser->mOutput->addLink()" otherwise there's processing/page-rendering issues.

The best solution of which I can think is to create a new static method in the Title class whose only purpose is to check whether or not a page exists based on a given text, and have this check done without fiddling with the underlying Title map or creating a Title object:

Title::exists($text, $defaultNamespace = NS_MAIN)

(Such a method may already exist. If so, then parser extension needs to be changed so that said method is called instead of "Title::newFromText().")

Tbleher added a comment.Via ConduitMar 26 2008, 6:27 PM

(In reply to comment #0)

i've traced it down to the line $parser->mOutput->addLink( $title, $id ); that
was added by tstarling in revision 19892 on Mon Feb 12 10:24:23 2007 UTC with
the reason "Register a link on #ifexist. Otherwise this breaks cache
coherency.."

i can find no logical reasoning for this change. all it is doing is checking if
the target exists, and outputting one of two user supplied text blocks. and
that is all it should do. it is not making a link to target, nor does it
display a link to target anywhere in the scope of this functions code so why
does the target need to be added to the link list?

I guess it's there so that the page containing the #ifexist check can be purged when the target page gets created.

tstarling added a comment.Via ConduitSep 26 2008, 3:44 PM

See bug 10857.

MZMcBride added a comment.Via ConduitSep 26 2008, 3:54 PM

I've generalized the summary for this bug following the resolution of bug 15731.

#ifexist can generate links like [[Category:Foobar]] but it places the rows in the pagelinks table instead of using of a more appropriate table (in this case it would be the categorylinks table).

This pollutes the results of cross namespace link searches and other tools.

bzimport added a comment.Via ConduitSep 26 2008, 4:06 PM

ayg wrote:

Comment 2 is correct. It needs to go in some links table or other so the page can be purged if necessary. I'm pretty sure that the Title cache referred to in comment 1 is irrelevant. I guess we could create an ifexistlinks table, but that seems pretty stupid. Adding entries to pagelinks seems as correct to me as anything.

(In reply to comment #4)

#ifexist can generate links like [[Category:Foobar]] but it places the rows in
the pagelinks table instead of using of a more appropriate table (in this case
it would be the categorylinks table).

Why is the categorylinks table more appropriate? That would be completely broken, it would show up on category pages just for existence checking. Likewise templatelinks would cause it to show up in "templates used on this page". pagelinks is the generic one and should be used for random stuff like this.

This pollutes the results of cross namespace link searches and other tools.

Putting it in the categorylinks table would effectively categorize the page, which is much less correct. Putting it in pagelinks doesn't disrupt anything in the software proper, except arguably WhatLinksHere (but only arguably: it does kind of link there). If people are assuming that pagelinks entries correspond to actual clickable links on the page, yes, this breaks that assumption.

We're both talking about the same thing, right? {{#ifexist:Category:Foobar|...}} adds a pagelinks entry for [[Category:Foobar]]. {{#ifexist:Somerandompage|[[Category:Foobar]]}} adds a pagelinks entry for [[Somerandompage]], and (maybe) a categorylinks entry for [[Category:Foobar]]. Yes?

bzimport added a comment.Via ConduitDec 28 2008, 5:15 AM

c.stafford wrote:

We're both talking about the same thing, right?
{{#ifexist:Category:Foobar|...}} adds a pagelinks entry for [[Category:Foobar]].

yes

->exists() on an Article causes this (i assume this is what the #ifexist parser function does too, have not looked)

bzimport added a comment.Via ConduitJul 19 2009, 6:22 AM

herd wrote:

*** Bug 15735 has been marked as a duplicate of this bug. ***

MarkAHershberger added a comment.Via ConduitApr 12 2011, 4:19 PM

Punting this to the new parser Brion has under development.

Platonides added a comment.Via ConduitNov 27 2011, 4:38 PM

This has little to do with the parser. It's about the way that information is stored.
I suppose we could add an ENUM to pagelinks to state if it's a link check or a ifexists, but pagelinks does seem the right table .

bzimport added a comment.Via ConduitJun 22 2012, 12:11 AM

beachtyper wrote:

This bug is really annoying if your userbase uses Special:WantedPages to guide their editing and you've got templates which liberally use #ifexist. Has any progress been made in resolving it?

MZMcBride added a comment.Via ConduitFeb 6 2013, 9:33 AM

Adding a pl_status column or similar to the pagelinks table seems like the path forward for this bug.

I thought alterations to Wikimedia wikis' pagelinks tables were off-limits due to their size, but Tim assures me that after recent alterations to the revision and categorylinks tables, alterations to pagelinks are doable.

Parent5446 added a comment.Via ConduitJun 19 2014, 8:38 PM

I'm wondering, are there any issues with just merging all the link tables? It would allow extensions to use their own custom link types rather than having this issue.

JanZerebecki added a comment.Via ConduitJun 19 2014, 9:04 PM

Do not try to merge all link types to fix this bug. This is really a subtype of pagelink instead of just another link type like templatelinks. The previous idea to just add a column to pagelinks is better.

Not sure if other link types also share similarities regaring parser cache invalidation, life cycle, etc. But trying that would make this change much more bigger, complicated and harder to validate. Also such a schema change would not be forward compatible, which means a maintenance window during which no one could edit.

Parent5446 added a comment.Via ConduitJun 19 2014, 10:02 PM

(In reply to Jan Zerebecki from comment #13)

Do not try to merge all link types to fix this bug. This is really a subtype
of pagelink instead of just another link type like templatelinks. The
previous idea to just add a column to pagelinks is better.

Well it wouldn't be just to fix this bug. I'm sure there are other extensions that would like to record their own types of links.

Also such a schema change
would not be forward compatible, which means a maintenance window during
which no one could edit.

This is probably not true. I'm sure it's possible to make such a schema change without interrupting availability. I'm just speculating, but assuming that statement-based replication is used, you can bring down a slave server, combine the tables into one, set up views as replacements for the deleted tables, restart the slave, and when it is up to date, make the slave the new master and replicate the schema changes across the remaining servers in the cluster.

JanZerebecki added a comment.Via ConduitJun 20 2014, 10:45 AM

Yes you are right, there are ways to do that without interrupting availability. Another possible one is to first create the new table, then change the code to deal with both tables but move data to the new table and later remove the understanding of the old empty tables. However this direction is IMHO more complicated than necessary.

For the new ifexist pagelink type to only be known by the extension, everything that uses pagelinks needs to call back into the extension. Do you know the complete list?

Parent5446 added a comment.Via ConduitJun 20 2014, 1:34 PM

Well just to be clear, if we're going to be making any schema changes, we are not just going to randomly add a link type field to the pagelinks table, primarily because core has no use for such field. That was the reason I suggested the combined table approach.

If you want to solve this bug without a schema change, your only other option is to make a separate extension table for ParserFunctions.

JanZerebecki added a comment.Via ConduitJun 21 2014, 2:24 PM

This wouldn't be randomly adding a link type field. It would be adding support for pagelinks that do not count as a wanted page to core, even though core would not create such pagelinks on itself.

Creating an extension table for these links poses the same question: Which functionality (core or in Extensions) that currently obtains its information (possibly indirectly) from the pagelinks table would then need hooks in core to call into this new extension table?

Jackmcbarn added a comment.Via ConduitOct 4 2014, 2:02 PM
  • Bug 71636 has been marked as a duplicate of this bug. ***
Anomie added a comment.Via ConduitOct 5 2014, 8:07 PM
  • Bug 71637 has been marked as a duplicate of this bug. ***
Atamari added a subscriber: Atamari.Via WebMar 11 2015, 10:55 AM
Man77 added a subscriber: Man77.Via WebMar 23 2015, 10:00 PM
Jdforrester-WMF removed a project: Newparser.Via WebThu, Jun 4, 9:39 PM

Add Comment