Page MenuHomePhabricator

Interwiki map has duplicate keys with conficting values
Closed, ResolvedPublicBUG REPORT

Description

https://noc.wikimedia.org/conf/interwiki.php.txt contains 146 duplicate keys. Of these, 145 have conflicting values; 1 is a repetition of the same value. It seems like it's always the last value that should be taken in case of conflicts. But to make the interpretation non-ambiguous, perhaps it might be good to remove the duplicates from this file.

Here’s a Python script to find the duplicates:

import re, urllib.request

iw = urllib.request.urlopen('https://noc.wikimedia.org/conf/interwiki.php.txt')
seen = {}
for key, value in re.findall(r"'(.+?)' => '(.+?)'", iw.read().decode('utf-8')):
    seen.setdefault(key, []).append(value)
for key, values in sorted(seen.items()):
    if len(values) > 1:
        print("'%s' => %s" % (key, values))

Here's the duplicates as of May 23, 2024:

'__global:c' => ['1 https://commons.wikimedia.org/wiki/$1', '1 https://commons.wikimedia.org/wiki/$1']
'_wiki:be-x-old' => ['1 https://be-x-old.wikipedia.org/wiki/$1', '1 https://be-tarask.wikipedia.org/wiki/$1']
'_wiki:gsw' => ['1 https://gsw.wikipedia.org/wiki/$1', '1 https://als.wikipedia.org/wiki/$1']
'_wiki:lzh' => ['1 https://lzh.wikipedia.org/wiki/$1', '1 https://zh-classical.wikipedia.org/wiki/$1']
'_wiki:nan' => ['1 https://nan.wikipedia.org/wiki/$1', '1 https://zh-min-nan.wikipedia.org/wiki/$1']
'_wiki:rup' => ['1 https://rup.wikipedia.org/wiki/$1', '1 https://roa-rup.wikipedia.org/wiki/$1']
'_wiki:sgs' => ['1 https://sgs.wikipedia.org/wiki/$1', '1 https://bat-smg.wikipedia.org/wiki/$1']
'_wiki:vro' => ['1 https://vro.wikipedia.org/wiki/$1', '1 https://fiu-vro.wikipedia.org/wiki/$1']
'_wiki:yue' => ['1 https://yue.wikipedia.org/wiki/$1', '1 https://zh-yue.wikipedia.org/wiki/$1']
'_wikibooks:be-x-old' => ['1 https://be-x-old.wikibooks.org/wiki/$1', '1 https://be-tarask.wikibooks.org/wiki/$1']
'_wikibooks:gsw' => ['1 https://gsw.wikibooks.org/wiki/$1', '1 https://als.wikibooks.org/wiki/$1']
'_wikibooks:lzh' => ['1 https://lzh.wikibooks.org/wiki/$1', '1 https://zh-classical.wikibooks.org/wiki/$1']
'_wikibooks:nan' => ['1 https://nan.wikibooks.org/wiki/$1', '1 https://zh-min-nan.wikibooks.org/wiki/$1']
'_wikibooks:rup' => ['1 https://rup.wikibooks.org/wiki/$1', '1 https://roa-rup.wikibooks.org/wiki/$1']
'_wikibooks:sgs' => ['1 https://sgs.wikibooks.org/wiki/$1', '1 https://bat-smg.wikibooks.org/wiki/$1']
'_wikibooks:vro' => ['1 https://vro.wikibooks.org/wiki/$1', '1 https://fiu-vro.wikibooks.org/wiki/$1']
'_wikibooks:yue' => ['1 https://yue.wikibooks.org/wiki/$1', '1 https://zh-yue.wikibooks.org/wiki/$1']
'_wikimedia:be-x-old' => ['1 https://be-x-old.wikimedia.org/wiki/$1', '1 https://be-tarask.wikimedia.org/wiki/$1']
'_wikimedia:gsw' => ['1 https://gsw.wikimedia.org/wiki/$1', '1 https://als.wikimedia.org/wiki/$1']
'_wikimedia:lzh' => ['1 https://lzh.wikimedia.org/wiki/$1', '1 https://zh-classical.wikimedia.org/wiki/$1']
'_wikimedia:nan' => ['1 https://nan.wikimedia.org/wiki/$1', '1 https://zh-min-nan.wikimedia.org/wiki/$1']
'_wikimedia:rup' => ['1 https://rup.wikimedia.org/wiki/$1', '1 https://roa-rup.wikimedia.org/wiki/$1']
'_wikimedia:sgs' => ['1 https://sgs.wikimedia.org/wiki/$1', '1 https://bat-smg.wikimedia.org/wiki/$1']
'_wikimedia:vro' => ['1 https://vro.wikimedia.org/wiki/$1', '1 https://fiu-vro.wikimedia.org/wiki/$1']
'_wikimedia:yue' => ['1 https://yue.wikimedia.org/wiki/$1', '1 https://zh-yue.wikimedia.org/wiki/$1']
'_wikinews:be-x-old' => ['1 https://be-x-old.wikinews.org/wiki/$1', '1 https://be-tarask.wikinews.org/wiki/$1']
'_wikinews:gsw' => ['1 https://gsw.wikinews.org/wiki/$1', '1 https://als.wikinews.org/wiki/$1']
'_wikinews:lzh' => ['1 https://lzh.wikinews.org/wiki/$1', '1 https://zh-classical.wikinews.org/wiki/$1']
'_wikinews:nan' => ['1 https://nan.wikinews.org/wiki/$1', '1 https://zh-min-nan.wikinews.org/wiki/$1']
'_wikinews:rup' => ['1 https://rup.wikinews.org/wiki/$1', '1 https://roa-rup.wikinews.org/wiki/$1']
'_wikinews:sgs' => ['1 https://sgs.wikinews.org/wiki/$1', '1 https://bat-smg.wikinews.org/wiki/$1']
'_wikinews:vro' => ['1 https://vro.wikinews.org/wiki/$1', '1 https://fiu-vro.wikinews.org/wiki/$1']
'_wikinews:yue' => ['1 https://yue.wikinews.org/wiki/$1', '1 https://zh-yue.wikinews.org/wiki/$1']
'_wikiquote:be-x-old' => ['1 https://be-x-old.wikiquote.org/wiki/$1', '1 https://be-tarask.wikiquote.org/wiki/$1']
'_wikiquote:gsw' => ['1 https://gsw.wikiquote.org/wiki/$1', '1 https://als.wikiquote.org/wiki/$1']
'_wikiquote:lzh' => ['1 https://lzh.wikiquote.org/wiki/$1', '1 https://zh-classical.wikiquote.org/wiki/$1']
'_wikiquote:nan' => ['1 https://nan.wikiquote.org/wiki/$1', '1 https://zh-min-nan.wikiquote.org/wiki/$1']
'_wikiquote:rup' => ['1 https://rup.wikiquote.org/wiki/$1', '1 https://roa-rup.wikiquote.org/wiki/$1']
'_wikiquote:sgs' => ['1 https://sgs.wikiquote.org/wiki/$1', '1 https://bat-smg.wikiquote.org/wiki/$1']
'_wikiquote:vro' => ['1 https://vro.wikiquote.org/wiki/$1', '1 https://fiu-vro.wikiquote.org/wiki/$1']
'_wikiquote:yue' => ['1 https://yue.wikiquote.org/wiki/$1', '1 https://zh-yue.wikiquote.org/wiki/$1']
'_wikisource:be-x-old' => ['1 https://be-x-old.wikisource.org/wiki/$1', '1 https://be-tarask.wikisource.org/wiki/$1']
'_wikisource:gsw' => ['1 https://gsw.wikisource.org/wiki/$1', '1 https://als.wikisource.org/wiki/$1']
'_wikisource:lzh' => ['1 https://lzh.wikisource.org/wiki/$1', '1 https://zh-classical.wikisource.org/wiki/$1']
'_wikisource:nan' => ['1 https://nan.wikisource.org/wiki/$1', '1 https://zh-min-nan.wikisource.org/wiki/$1']
'_wikisource:rup' => ['1 https://rup.wikisource.org/wiki/$1', '1 https://roa-rup.wikisource.org/wiki/$1']
'_wikisource:sgs' => ['1 https://sgs.wikisource.org/wiki/$1', '1 https://bat-smg.wikisource.org/wiki/$1']
'_wikisource:vro' => ['1 https://vro.wikisource.org/wiki/$1', '1 https://fiu-vro.wikisource.org/wiki/$1']
'_wikisource:yue' => ['1 https://yue.wikisource.org/wiki/$1', '1 https://zh-yue.wikisource.org/wiki/$1']
'_wikiversity:be-x-old' => ['1 https://be-x-old.wikiversity.org/wiki/$1', '1 https://be-tarask.wikiversity.org/wiki/$1']
'_wikiversity:gsw' => ['1 https://gsw.wikiversity.org/wiki/$1', '1 https://als.wikiversity.org/wiki/$1']
'_wikiversity:lzh' => ['1 https://lzh.wikiversity.org/wiki/$1', '1 https://zh-classical.wikiversity.org/wiki/$1']
'_wikiversity:nan' => ['1 https://nan.wikiversity.org/wiki/$1', '1 https://zh-min-nan.wikiversity.org/wiki/$1']
'_wikiversity:rup' => ['1 https://rup.wikiversity.org/wiki/$1', '1 https://roa-rup.wikiversity.org/wiki/$1']
'_wikiversity:sgs' => ['1 https://sgs.wikiversity.org/wiki/$1', '1 https://bat-smg.wikiversity.org/wiki/$1']
'_wikiversity:vro' => ['1 https://vro.wikiversity.org/wiki/$1', '1 https://fiu-vro.wikiversity.org/wiki/$1']
'_wikiversity:yue' => ['1 https://yue.wikiversity.org/wiki/$1', '1 https://zh-yue.wikiversity.org/wiki/$1']
'_wikivoyage:be-x-old' => ['1 https://be-x-old.wikivoyage.org/wiki/$1', '1 https://be-tarask.wikivoyage.org/wiki/$1']
'_wikivoyage:gsw' => ['1 https://gsw.wikivoyage.org/wiki/$1', '1 https://als.wikivoyage.org/wiki/$1']
'_wikivoyage:lzh' => ['1 https://lzh.wikivoyage.org/wiki/$1', '1 https://zh-classical.wikivoyage.org/wiki/$1']
'_wikivoyage:nan' => ['1 https://nan.wikivoyage.org/wiki/$1', '1 https://zh-min-nan.wikivoyage.org/wiki/$1']
'_wikivoyage:rup' => ['1 https://rup.wikivoyage.org/wiki/$1', '1 https://roa-rup.wikivoyage.org/wiki/$1']
'_wikivoyage:sgs' => ['1 https://sgs.wikivoyage.org/wiki/$1', '1 https://bat-smg.wikivoyage.org/wiki/$1']
'_wikivoyage:vro' => ['1 https://vro.wikivoyage.org/wiki/$1', '1 https://fiu-vro.wikivoyage.org/wiki/$1']
'_wikivoyage:yue' => ['1 https://yue.wikivoyage.org/wiki/$1', '1 https://zh-yue.wikivoyage.org/wiki/$1']
'_wiktionary:be-x-old' => ['1 https://be-x-old.wiktionary.org/wiki/$1', '1 https://be-tarask.wiktionary.org/wiki/$1']
'_wiktionary:gsw' => ['1 https://gsw.wiktionary.org/wiki/$1', '1 https://als.wiktionary.org/wiki/$1']
'_wiktionary:lzh' => ['1 https://lzh.wiktionary.org/wiki/$1', '1 https://zh-classical.wiktionary.org/wiki/$1']
'_wiktionary:nan' => ['1 https://nan.wiktionary.org/wiki/$1', '1 https://zh-min-nan.wiktionary.org/wiki/$1']
'_wiktionary:rup' => ['1 https://rup.wiktionary.org/wiki/$1', '1 https://roa-rup.wiktionary.org/wiki/$1']
'_wiktionary:sgs' => ['1 https://sgs.wiktionary.org/wiki/$1', '1 https://bat-smg.wiktionary.org/wiki/$1']
'_wiktionary:vro' => ['1 https://vro.wiktionary.org/wiki/$1', '1 https://fiu-vro.wiktionary.org/wiki/$1']
'_wiktionary:zh-yue' => ['1 https://zh-yue.wiktionary.org/wiki/$1', '1 https://yue.wiktionary.org/wiki/$1']

Event Timeline

I think the script could benefit from being refactored a little, so that it doesn't just output on the fly, but builds a proper list, does the de-duplication (explicitly or implicitly), and then does the output...

Change #1035408 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/WikimediaMaintenance@master] dumpInterwiki: Stop outputting on the fly, to allow de-duplication

https://gerrit.wikimedia.org/r/1035408

Change #1035389 had a related patch set uploaded (by Reedy; author: Reedy):

[operations/mediawiki-config@master] interwiki.php: Remove duplicates

https://gerrit.wikimedia.org/r/1035389

It looks like most of the duplicates come from these aliases:

	/**
	 * Language aliases, usually configured as redirects to the real wiki in apache
	 * Interlanguage links are made directly to the real wiki
	 * @var array
	 */
	protected static $languageAliases = [
		# Nasty legacy codes
		'cz' => 'cs',
		'be-x-old' => 'be-tarask',
		'dk' => 'da',
		'epo' => 'eo',
		'jp' => 'ja',
		'zh-cn' => 'zh',
		'zh-tw' => 'zh',
		# Real ISO language codes to our fake ones
		'cmn' => 'zh',
		'egl' => 'eml',
		'en-simple' => 'simple', # T283149
		'gsw' => 'als',
		'lzh' => 'zh-classical',
		'nan' => 'zh-min-nan',
		'nb' => 'no',
		'rup' => 'roa-rup',
		'sgs' => 'bat-smg',
		'vro' => 'fiu-vro',
		'yue' => 'zh-yue',
	];

Change #1035408 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMaintenance@master] dumpInterwiki: Stop outputting on the fly, to allow de-duplication

https://gerrit.wikimedia.org/r/1035408

Change #1035389 abandoned by Reedy:

[operations/mediawiki-config@master] interwiki.php: Remove duplicates

Reason:

https://gerrit.wikimedia.org/r/1035389

Change #1040766 had a related patch set uploaded (by Reedy; author: Reedy):

[operations/mediawiki-config@master] interwiki(-labs).php: De-duplicate and update from meta

https://gerrit.wikimedia.org/r/1040766

Change #1040766 merged by jenkins-bot:

[operations/mediawiki-config@master] interwiki(-labs).php: De-duplicate and update from meta

https://gerrit.wikimedia.org/r/1040766

Reedy claimed this task.