Page MenuHomePhabricator

Default interwiki prefixes overlapped with BCP 47 language tags
Closed, InvalidPublicBUG REPORT

Description

Steps to replicate the issue:

  • Check default interwiki map [1] [2]
  • Check BCP 47 "language" subtags [3]

What happens?:

Several default interwiki codes/prefixes overlapped with BCP 47 language tags, which is used for interlanguage prefixes.

%%
Type: language
Subtag: dcc
Description: Deccan
Added: 2009-07-29
%%
%%
Type: language
Subtag: doi
Description: Dogri (macrolanguage)
Added: 2005-10-16
Scope: macrolanguage
%%
%%
Type: language
Subtag: git
Description: Gitxsan
Added: 2009-07-29
%%
%%
Type: language
Subtag: zum
Description: Kumzari
Added: 2009-07-29
%%

What should have happened instead?:

We shouldn't use valid BCP 47 language tags (and any codes with valid format) as non-interlanguage interwiki prefixes.

We should use some kind of BCP 47 validator instead of rely on Names.php

Software version:

Other information:

Event Timeline

aew and twl are also prefixes matching existing language tags.

The other 3-letter prefixes that haven't been assigned as language codes yet are dpd, hdl, irc, rev, rfc, svn, wmf and wqy.

By "valid format", do you mean that it's well-formed (it has the right format to potentially be a language tag, even if it (or part of it) is not assigned right now)?

Over half of the current interwiki prefixes would be well-formed language tags - the language code part of a language tag can be 2 to 8 letters long (see https://www.rfc-editor.org/rfc/rfc5646.html#section-2.1).

In practice, it's unlikely that there will be any new 2-letter codes, 4-letter codes are only reserved for potential future standards, and they're unlikely to approve any 5-8 letter codes either (search the ietf-languages mailing list archives for Elfdalian if you're interested), which only leaves 3-letter codes.

I think it would make more sense to avoid prefixes which are 3 letters long - that's much simpler to implement (doesn't need a copy of the language subtag registry), would cover the existing cases, and would limit it to codes which might be used as interwiki language prefixes (Langcom are unlikely to approve any wikis using language codes with subtags).

Note that the task title here conflates https://meta.wikimedia.org/wiki/Interwiki_map (interwikis used on Wikimedia wikis) with https://gerrit.wikimedia.org/g/mediawiki/core/%2B/HEAD/maintenance/interwiki.list (default interwikis for new MediaWiki wikis). Updates to the former are managed on-wiki and not via Phabricator. Updates to the latter are managed via Phabricator.

On specific issues:

Pppery edited projects, added Wikimedia-Interwiki-links; removed MediaWiki-Interwiki.

Per my previous comment, updates to the interwiki map for wmf wikis are managed on Meta-Wiki not via Phabricator.

The only one of these that's in interwiki.list is DOI, which, well, is almost certainly not going to be removed.

Filed aew and twl over on Meta as well.

Note that the task title here conflates https://meta.wikimedia.org/wiki/Interwiki_map (interwikis used on Wikimedia wikis) with https://gerrit.wikimedia.org/g/mediawiki/core/%2B/HEAD/maintenance/interwiki.list (default interwikis for new MediaWiki wikis). Updates to the former are managed on-wiki and not via Phabricator. Updates to the latter are managed via Phabricator.

On specific issues:

Citing 2020 discussion in Meta for reference (by PiRSquared17): Dogri has an open request for a Wikipedia. However, doi is a macrolanguage code. The individual codes are dgo (Dogri proper), xnr (Kangri). Crisis averted, probably.