This should probably be tagged as "Pywikibot-interwikidata.py" but there doesn't seem to be an available tag for that item.
The Wikibase extension and the Pywikibot-interwikidata.py script both contain strict hard-coded assumptions which, while likely valid on WMF wikis, may break on third-party wikis:
- T172076: The code assumes that the GlobalID naming convention will be (language code)+(group name) with any hyphens replaced with underscores, It also hard-codes an assumption that the (group name) will always be "wiki*" or "wiktionary" (as WMF project names) and removing that trailing group name will yield the local language code.
- T221550 : The API and core code assume the local database name (wikiID) can be reported to API clients as a presumed-standard GlobalID which is consistent in format, unique across that entire project and follows all naming conventions. (This won't be fixed at the API level until GlobalID exists in core MW code and, even then, good luck getting externally-hosted projects to update their configs.)
- T221556 : Furthermore, interwikidata.py assumes there are no individual language wikis in the group which are independently hosted (or which lack access to the common repository). The script takes a list of interwikis from the article, makes an API query for each to see if it's already linked to an item, so that it may treats anything linked to some other Wikibase Q-item as a conflict. Unfortunately, if the API responds that there is no Wikibase at all behind one language's site, the script does not even attempt to handle this condition and immediately exits - when the proper behaviour should be to treat a "We don't have a lord. We're an autonomous collective." response as there being no conflicting Q-item link on the remote wiki (so OK, no error).
Even if these issues are fixed locally, one problem remains: any externally-hosted wikis will be returning their local database name as WikiID - and that won't match the GlobalID.
That's happening because interwikidata.py presumes the API is providing a GlobalID while the API presumes there is no GlobalID support in core and returns the local database name. That's a design flaw; there are workarounds in other places (such as $wgWBRepoSettings['localClientDatabases']['ptuncyc']='uncyc_pt'; in the Wikibase-repo extension config) but there's no table to map the local database names to the API WikiID to the pywikibot/site.py (which is blindly expecting the WikiID to actually be the GlobalID, always).
Steps to Reproduce:
Install and try to run Pywikibot-interwiki.py on Uncyclopedia. (This will require patching code to address T221556 first, which I shall not address here, and the "home wiki" for the bot will need to be set to one of the languages which has access to the repository.)
There's a (somewhat-broken) Wikidata repository on *.uncyclopedia.info but the project is a mess of independently-hosted languages (such as Russian, Polish, Korean), items on external wiki farms (Italian is on Miraheze?) and entire clusters of wikis (*.uncyclopedia.co) which are separate from anything on the repo.
In theory, the Wikibase extension code should be capable of creating an outbound inter-language link to an externally-hosted project if its page and API links are in the `sites` table. In practice, everything still goes haywire even after the other bugs listed above have been patched (or kludged, or worked around...) as the wikiID being reported by the individual external projects seems to vary widely, depending on who is hosting each individual language.
Actual Results:
Every time a link to the externally-hosted site is found, if the site's API-reported database name doesn't match the expected GlobalID, the script will report "Unknown site:" and the database name reported by the remote API. This prevents the script from creating outbound interlanguage links to that specific externally-hosted site.
Expected Results:
The only easy way to get the desired result (the script can make outbound-only links to externally-hosted languages, even if that doesn't generate a backlink from the external site) is to add a translation table to be consulted in pywikibot/site.py - something like:
def dbName(self):
"""Return this site's internal id."""
wikiIDmap = {
'uncy_cs': 'csuncyc',
'uncy_de': 'deuncyc',
'uncy_en': 'enuncyc',
'uncy_es': 'esuncyc',
'uncy_fr': 'fruncyc',
'uncy_he': 'heuncyc',
'uncy_un': 'en_gbuncyc',
'engbuncyc': 'en_gbuncyc',
'zhtwuncyc': 'zh_twuncyc',
'beidipediawiki': 'aruncyc',
'nonciclopediawiki': 'ituncyc',
'uncyclopediawiki': 'zh_cnuncyc',
'uncyclo_pedia': 'kouncyc',
'nonsensopedia': 'pluncyc',
'absurd': 'ruuncyc'
}
return wikiIDmap.get(self.siteinfo['wikiid'], self.siteinfo['wikiid'])
instead of the original (pywikibot/site.py lines 2727-2729
def dbName(self):
"""Return this site's internal id."""
return self.siteinfo['wikiid']
This is a kludge. Ultimately, the wikiIDmap needs to exist as part of the configuration file, perhaps user-config.py or user-added to the generated uncyclopedia-family.py file.
The current code is relying on the API to be returning GlobalID and the GlobalID concept (per T221550) simply doesn't exist in the API because it doesn't exist in core code. WMF is a closed, controlled environment where the local database names follow one, specific known pattern that matches the GlobalID. A third-party external site? Don't count on anything.