Page MenuHomePhabricator

Parsoid needs a bidirectional interwiki map (and hooks)
Open, Needs TriagePublic

Description

For Parsoid purposes, the interwiki mapping needs to be bidirectional -- that is, we need to be able not just to go from enwiki:$1 to //en.wikipedia.org/wiki/$1 but also given a URL https://en.wkipedia.org/wiki/Foo we need to lookup that this matches the enwiki prefix and that the Title part should be Foo.

We are currently doing this by calling the InterwikiLookup::getAllPrefixes() method *but* this doesn't play nicely with the InterwikiLoadPrefix hook. That is, InterwikiLoadPrefix is demand-driven, but extensions can add new prefixes via InterwikiLoadPrefix (and the used-in-WMF-production Extension:Interwiki does so, as does the parser test infrastructure) without them showing up in ::getAllPrefixes(). In fact, Extension:Interwiki explicitly requires that the new prefixes *not* show up in ::getAllPrefixes, as its implementation of the InterwikiLoadPrefix hook recursively calls ::getAllPrefixes to avoid delegating lookup of prefixes defined locally.

One solution is to add a "reverse mapping" API and hook to InterwikiLookup, say:

 public function match( $url ): string {
   if ( !$this->hookRunner->onInterwikiMatchPrefix( $url, $result ) ) {
        return $result;
   }
   // else....
}

This would allow Parsoid to call *just* InterwikiLookup::fetch and InterwikiLookup::match and avoid calling the problematic ::getAllPrefixes method (or indeed, avoid enumerating all prefixes at all).

The Extension:Interwiki could continue to recursively invoke ::getAllPrefixes to determine what prefixes the base wiki supports. It would implement the new InterwikiMatchPrefix hook by enumerating the interwiki table from the central DB (caching would help here!) and then returning true to indicate that any matches from the local DB should take precedence.

T113034: RFC: Overhaul Interwiki map, unify with Sites and WikiMap proposes to overhaul and unify the interwiki map; it's possible that would lead to a better solution. Putting in an ad-hoc match hook now would at least guide that work by making concrete the requirement that the mapping be bidirectional.

Event Timeline

cscott created this task.Dec 17 2020, 9:59 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 17 2020, 9:59 PM
cscott updated the task description. (Show Details)Dec 17 2020, 11:06 PM

Change 650304 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] WIP: Use new interwikimap alternative in SiteConfig

https://gerrit.wikimedia.org/r/650304

Change 650308 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] WIP: Add new InterwikiMatchPrefix hook

https://gerrit.wikimedia.org/r/650308

cscott updated the task description. (Show Details)Dec 18 2020, 2:26 PM
cscott updated the task description. (Show Details)

Change 650613 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] WIP: ParserTestRunner: Don't use InterwikiLoadPrefix hook

https://gerrit.wikimedia.org/r/650613

Change 650613 merged by jenkins-bot:
[mediawiki/core@master] Deprecate InterwikiLoadPrefix hook

https://gerrit.wikimedia.org/r/650613

Change 651527 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/Interwiki@master] Don't register InterwikiLoadPrefix hook unless it is needed

https://gerrit.wikimedia.org/r/651527

Change 651533 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/Scribunto@master] TitleLibraryTest: Don't use deprecated InterwikiLoadPrefix hook

https://gerrit.wikimedia.org/r/651533

Change 651535 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/UploadWizard@master] UploadWizardConfigTest: avoid using deprecated InterwikiLoadPrefix hook

https://gerrit.wikimedia.org/r/651535

Change 651538 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/skins/MinervaNeue@master] Avoid using deprecated InterwikiLoadPrefix hook in test setup

https://gerrit.wikimedia.org/r/651538

Change 651535 merged by jenkins-bot:
[mediawiki/extensions/UploadWizard@master] UploadWizardConfigTest: avoid using deprecated InterwikiLoadPrefix hook

https://gerrit.wikimedia.org/r/651535

Change 651538 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] Avoid using deprecated InterwikiLoadPrefix hook in test setup

https://gerrit.wikimedia.org/r/651538

Apologies for some potentially dumb questions; I got here by following some of the recent patches related to the InterwikiLoadPrefix hook deprecation and I'm a bit confused...

The main use case as to why literally any and all non-WMF wikis would want to install MediaWiki-extensions-Interwiki is so that a select group or groups of users can view and modify the interwiki table via an on-wiki interface, and that all wiki users (whether registered or not) can view the contents of that table. AIUI the former feature is disabled on WMF wikis (though I'm not quite sure why!) and only the latter one is used on WMF sites, and WMF wikis also use the CDB caching as the interwiki list is generated manually for WMF wikis.

For ShoutWiki I once implemented an extension called "ShoutWiki InterwikiMagic", which got eventually merged into the Interwiki extension by @Isarra (T70241: Add support for global interwiki tables as well as local). This extension allows us to implement global (=shared across all ShoutWiki wikis) inter*wiki* links, so that no matter what ShoutWiki wiki you're on, [[wikipedia:]] points to https://en.wikipedia.org/wiki/$1, and so on, while yet preserving the ability to have local inter*language* links, because obviously on http://fi.shoutwiki.com/ (the Finnish version of ShoutWiki Hub, ShoutWiki's central wiki) [[en:]] points to a different wiki than what it does on http://fi.starwars.shoutwiki.com/, for example.
This feature is implemented via the aforementioned InterwikiLoadPrefix hook and it uses the $wgInterwikiCentralDB global configuration variable (to set the name of the database containing the global interwiki table; on ShoutWiki this is Hub's database, the database for the site located at http://www.shoutwiki.com/).

My questions are thus:

  1. Is this aforementioned feature in MediaWiki-extensions-Interwiki implemented in T70241 being deprecated?
  2. If so, what is the replacement?
  3. If not, do I, as a sysadmin, need to do anything to keep using $wgInterwikiCentralDB as usual?

The local/global/site interwiki tables are implemented in the CDB caching, that's not expected to change.

But patching the interwiki table dynamically from the DB via the InterwikiLoadPrefix hook is a broken design -- there's no way to enumerate all known prefixes, because Extension:Interwiki explicitly delegates to getAllPrefixes() in the base class *and expects it not to be extended or hooked* in order to determine whether the prefix is defined locally or not. So there's no good way to hook ::getAllPrefixes() without breaking InterwikiLoadPrefix in a vortex of infinite recursion.

In https://gerrit.wikimedia.org/r/c/mediawiki/core/+/650308 I began implementing a new InterwikiMatchPrefix hook that could serve as the mirror image to InterwikiLoadPrefix, but it quickly became very complicated.

So the short story is that WMF uses Extension:Interwiki but doesn't happen to use the InterwikiLoadPrefix hook in production. That hook is only needed for dynamic database lookup of prefixes, which isn't something WMF does -- WMF uses Special:Interwiki and (AIUI) has a wiki page which is editable and then periodically dumped to form the CDB file which is used in production as $wgInterwikiCache. So I'm deprecating the InterwikiLoadPrefix hook, but I don't expect to remove it any time soon. Dynamic lookup of interwiki prefixes won't work with the next generation wikitext parser from WMF (Parsoid) because there's currently no API or hook to enumerate all valid dynamic prefixes, which is needed when converting HTML hrefs back into the appropriate wikitext (ie, given http://fi.shoutwiki.com/Foo determine that this should be the wikitext [:fi:Foo]).

If dynamic interwiki lookup is important, then a new hook would need to be added to (at the least) enumerate all the valid prefixes. It's a bit complicated because if you add the hook to InterwikiLookup::getAllPrefixes in the obvious way then you'll break Extension:Interwiki. So it's probably safest to use the deprecation mechanism and introduce a replacement for InterwikiLoadPrefix at the same time as the other hook (InterwikiGetAllPrefixes or InterwikiMatchPrefix or ....?) is introduced. There's a 'filter' mechanism in the new Hook system which might be appropriate. Then you could shift the dynamic lookup in Extension:Interwiki to use the two new interfaces at the same time, which would help avoid some back compat issues. Perhaps only the InterwikiGetAllPrefixes hook is needed, with a bit of caching to ensure that this hook is called rarely and most title resolution happens via a cached value which is faster than a DB query.

See also T113034: RFC: Overhaul Interwiki map, unify with Sites and WikiMap -- you might want to ask over there are well, because it's possible some of what Extension:Interwiki is doing wants to live inside SiteMatrix or something like that.

Change 651527 merged by jenkins-bot:
[mediawiki/extensions/Interwiki@master] Don't register InterwikiLoadPrefix hook unless it is needed

https://gerrit.wikimedia.org/r/651527

Change 651533 merged by jenkins-bot:
[mediawiki/extensions/Scribunto@master] TitleLibraryTest: Don't use deprecated InterwikiLoadPrefix hook

https://gerrit.wikimedia.org/r/651533