Page MenuHomePhabricator

#ifexist for interwiki links
Open, LowPublic

Description

Author: mcdevitd

Description:
Currently, #ifexist does not work for interwiki links. No template call on, say, Wiktionary can tell whether a page on Wikipedia exists, or vice versa.

This functionality would have several potential uses, including making a targeted template so that failed searches and nonexistent pages on any project could detect when the search term exists on another project and *direct* users to the project they want, instead of always giving them links to six projects that might have a page if they check them all (which they are unlikely to do).


Version: unspecified
Severity: enhancement

Details

Reference
bz10237

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:49 PM
bzimport added a project: ParserFunctions.
bzimport set Reference to bz10237.
bzimport added a subscriber: Unknown Object (MLST).

There's a few possible ways to do this. Direct database checks might be 'cheapest' internally, but ugly to handle with our current config code mish-mash.

Doing an HTTP hit might be easiest. I'm not sure if we're doing anything sufficiently nice with 404 responses... a HEAD check would be nice. :)

Alternatively could hit Special:Export just to check for existence.

Would strongly recommend using an in-cluster cache for checks, perhaps a shared memcache area (which could then be cleared on create/destroy).

There are issues of resulting cache coherency and cascading updates, since we have no infrastructure for doing that across wikis.

There are also issues with resolution of double-prefixes (eg, w:en:Foo), which currently is not done 'natively'. That could be done 'for free' with certain types of HTTP checks, or we might have to build it in.

Just how do you intend to use HTTP headers here? Visiting [[en:fkdjsfkdjfa]] won't give you a 404, just a "This article doesn't exist" message.

Hence the sentence after it.

ayg wrote:

How about the bot API? That must have cheap checks for the existence of a page.

wikt.3.connelm wrote:

If a HEAD request comes in, the most lightweight thing you could do would be to return a 404 (or 301 REDIRECT) which would make the cross-project stuff enormously easier, no? That would have to be cheaper than invoking the full API.PHP stuff, right?

ayg wrote:

The PHP stuff has to be invoked either way at the receiving end, to decide whether the article exists or not to start with. Apache can't figure that out by itself, since we aren't talking about simple file existence. The question is how much PHP has to be invoked. If you just request the article you have to initialize the whole MW framework and prepare to show the page, and if it does exist you then actually have to render it and so forth (even for HEAD, because Content-Length needs to be correct). Using the API would just do API initialization and then a single database query, presumably, which should be considerably faster (especially in the worst case where the page actually has to be parsed and rendered).

(In reply to comment #6)

Using the API would just do API
initialization and then a single database query, presumably, which should be
considerably faster (especially in the worst case where the page actually has
to be parsed and rendered).

That is indeed correct. The API also allows checking for multiple pages in one request (and thus one DB query), up to a limit of 500 pages (5,000 for sysops and bots). The URL for a simple check would be:

http://www.mediawiki.org/w/api.php?action=query&prop=info&titles=API|Talk:API|Dog|User:fdfsdfa

(In reply to comment #6)

The PHP stuff has to be invoked either way at the receiving end, to decide
whether the article exists or not to start with. Apache can't figure that out
by itself, since we aren't talking about simple file existence. The question
is how much PHP has to be invoked. If you just request the article you have to
initialize the whole MW framework and prepare to show the page, and if it does
exist you then actually have to render it and so forth (even for HEAD, because
Content-Length needs to be correct).

Unless the 200 or 404 is cached by Squid, of course. :)

Using the API would just do API
initialization and then a single database query, presumably, which should be
considerably faster (especially in the worst case where the page actually has
to be parsed and rendered).

...but not cacheable, so that PHP and DB overhead is always there.

tisane2718 wrote:

I recommend that in addition to the interwiki #ifexist parser function, we also implement existence checking of interwiki links such as [[m:foo]], and provide for red/blue link coloration depending on the result of that check. Red links can be useful, in that they tempt users to create a new article. Therefore, in some cases, we don't want to have a template causing an interwiki link to appear as unclickable text, or not to appear if the page doesn't exist; we just want the link to be red.

We can implement such coloration via #ifexist as described at [[w:Wikipedia:Link_color#Making_links_appear_a_different_color_for_everyone]], but it would save keystrokes and simplify the markup just to have existence-detecting interwiki links. Presumably we'll use the LinkEnd or LinkBegin hook for this.

Red links are already covered in a classic bug 11

Maybe this will be possible to implement using Lua + WikiData?

(In reply to Helder from comment #12)

Maybe this will be possible to implement using Lua + WikiData?

This depends on Wikidata having an item for the target page, which isnt 100% guaranteed. Anyway, the data is mostly ready for this, and there is an API to implement it using Wikidata.

https://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&titles=Death_Star&props=labels&languages=nn&format=json

(but wiktionary isnt supported yet)

A #ifexists implemented in wikidata and lua (see but 11 comment 25) would be dependent on bug 47930.

Native #ifexist support would like be PHP invoking the target wiki API per comment 7, but the hard part is then tracking when the local page with the #ifexist call needs updating because the link target page is created or deleted on the link target wiki. Maybe mediawiki can re-use/general the external site changes notification work that Wikidata is doing in bug 47288, so that mediawiki sites can notify each other of these events. (or re-use an existing message bus framework that has PHP support) non-mediawikis on the interwiki map could be periodically polled by backend processes, and events fired into message bus when changes are found.