Page MenuHomePhabricator

"The supplied language code was not recognized." when loading data from wikidata into wikibase
Closed, DuplicatePublicBUG REPORT

Description

I'm importing into a fresh installation of a docker-wikibase and instance from Wikidata, namely Q2 (the earth). I'm using pywikibot to copy the labels from one base to the other. The problem is that when I import the labels I get:
"The supplied language code was not recognized. "
I can import most other entities without problem.

My conclusion is that there is somehow a mismatch between the languages supported in a Wikibase and the once supported by Wikidata. Does this explanation makes sense? Does someone know how to synchronise that?

Thank you
D063520

Event Timeline

DD063520 renamed this task from wikibase, importing from wikidata: The supplied language code was not recognized. to wikibase-> importing entity from wikidata "The supplied language code was not recognized.".Oct 22 2019, 3:02 PM

@DD063520: could you provide a sample to reproduce this behaviour?

This are the labels I'm trying to import:

{'en': 'Earth', 'fr': 'Terre', 'uz': 'Yer', 'de': 'Erde', 'it': 'Terra', 'pl': 'Ziemia', 'nb': 'jorda', 'eo': 'Tero', 'ru': 'Земля', 'es': 'Tierra', 'be-tarask': 'Зямля', 'en-gb': 'Earth', 'ja': '地球', 'zh-hant': '地球', 'fi': 'Maa', 'hr': 'Zemlja', 'pt': 'Terra', 'zh': '地球', 'gl': 'Terra', 'hu': 'Föld', 'vo': 'tal', 'cs': 'Země', 'sk': 'Zem', 'uk': 'Земля', 'nn': 'jorda', 'ace': 'Bumoë', 'af': 'Aarde', 'ang': 'Eorðe', 'an': 'Tierra', 'frp': 'Tèrra', 'ast': 'Tierra', 'gn': 'Yvy', 'ay': 'Aka pacha', 'az': 'Yer', 'map-bms': 'Bumi', 'bcl': 'Kinaban', 'bar': 'Eadn', 'bs': 'Zemlja', 'br': 'Douar', 'ca': 'Terra', 'ceb': 'Kalibotan', 'cbk-zam': 'Tierra', 'cy': 'Y Ddaear', 'da': 'Jorden', 'nv': 'Nahasdzáán', 'dsb': 'Zemja', 'et': 'Maa', 'eml': 'Tèra', 'ext': 'Tierra', 'eu': 'Lurra', 'hif': 'Dunia', 'fo': 'Jørðin', 'fy': 'Ierde', 'fur': 'Tiere', 'ga': 'An Domhan', 'gv': 'Yn Chruinney', 'gd': 'Saoghal', 'hak': 'Thi-khiù', 'ha': 'Duniya', 'haw': 'Honua', 'hsb': 'Zemja', 'io': 'Tero', 'ig': 'Àlà', 'ilo': 'Daga', 'id': 'Bumi', 'ia': 'Terra', 'is': 'Jörðin', 'jv': 'bumi', 'kl': 'Nunarsuaq', 'pam': 'Yatu', 'csb': 'Zemia', 'kw': 'Dor', 'rw': 'Isi', 'sw': 'Dunia', 'kg': 'Ntoto', 'ht': 'Latè', 'ku': 'Erd', 'lad': 'Tierra', 'ltg': 'Zeme', 'la': 'Tellus', 'lv': 'Zeme', 'lb': 'Äerd', 'lt': 'Žemė', 'lij': 'Tæra', 'li': 'Eerd', 'ln': 'Mabelé', 'jbo': 'la terdi', 'lmo': 'Tera', 'mg': 'Tany', 'mt': 'Id-Dinja', 'ms': 'Bumi', 'szl': 'Źymja', 'cdo': 'Dê-giù', 'mwl': 'Tierra', 'nah': 'Tlālticpactli', 'nl': 'Aarde', 'nds-nl': 'Eerde', 'nap': 'Terra', 'frr': 'Jard', 'pih': 'Erth', 'nrm': 'Tèrre', 'nov': 'Tere', 'oc': 'Tèrra', 'pfl': 'Erd', 'pap': 'Tera', 'pms': 'Tèra', 'tpi': 'Giraun', 'nds': 'Eer', 'ksh': 'Ääd', 'ro': 'Pământ', 'rmy': 'Phuv', 'rm': 'Terra', 'qu': 'Tiksimuyu', 'se': 'Eana', 'sc': 'Terra', 'sco': 'Yird', 'st': 'Lefatshe', 'nso': 'Lefase', 'sq': 'Toka', 'scn': 'Terra', 'sl': 'Zemlja', 'sh': 'Zemlja', 'su': 'Marcapada', 'sv': 'Jorden', 'tl': 'Daigdig', 'tr': 'Dünya', 'za': 'Giuznamh', 'vec': 'Tera', 'vep': 'Ma', 'vi': 'Trái Đất', 'wa': 'Daegne', 'vls': 'Eirde', 'war': 'Kalibutan', 'wo': 'Suuf', 'yo': 'àgbáyé', 'diq': 'Dınya', 'zea': 'Aerde', 'ab': 'Адгьыл', 'am': 'መሬት', 'ar': 'الأرض', 'arc': 'ܐܪܥܐ', 'arz': 'الارض', 'as': 'পৃথিৱী', 'be': 'Зямля', 'bg': 'Земя', 'bn': 'পৃথিবী', 'bo': 'སའི་གོ་ལ།', 'chr': 'ᎡᎶᎯ', 'ckb': 'زەوی', 'cu': 'Ꙁємлꙗ', 'cv': 'Çĕр', 'dv': 'ބިން', 'el': 'Γη', 'fa': 'زمین', 'gan': '地球', 'gu': 'પૃથ્વી', 'he': 'כדור הארץ', 'hy': 'Երկիր', 'iu': 'ᓄᓇ', 'ka': 'დედამიწა', 'kk': 'Жер', 'km': 'ផែនដី', 'kn': 'ಭೂಮಿ', 'ko': '지구', 'koi': 'Мушар', 'krc': 'Джер', 'kv': 'Му', 'ky': 'Жер', 'lez': 'Чил', 'lo': 'ໂລກ', 'mdf': 'Мода', 'mhr': 'Мланде', 'mk': 'Земја', 'ml': 'ഭൂമി', 'mn': 'Дэлхий', 'mr': 'पृथ्वी', 'my': 'ကမ္ဘာဂြိုဟ်', 'myv': 'Мода', 'mzn': 'زمین', 'ne': 'पृथ्वी', 'new': 'पृथ्वी', 'or': 'ପୃଥିବୀ', 'os': 'Зæхх', 'pa': 'ਧਰਤੀ', 'pnb': 'زمین', 'ps': 'ځمکه', 'rue': 'Земля', 'sa': 'पृथ्वी', 'sah': 'Сир', 'si': 'මහ පොළොව', 'so': 'Dhulka', 'sr': 'Земља', 'ta': 'புவி', 'te': 'భూమి', 'tg': 'Замин', 'th': 'โลก', 'tt': 'Җир', 'ug': 'يەر شارى', 'ur': 'زمین', 'wuu': '地球', 'xal': 'Делкә һариг', 'xmf': 'დიხაუჩა', 'yi': 'ערד-פלאנעט', 'en-ca': 'Earth', 'de-ch': 'Erde', 'pt-br': 'Planeta Terra', 'yue': '地球', 'zh-cn': '地球', 'zh-hans': '地球', 'zh-sg': '地球', 'zh-hk': '地球', 'zh-tw': '地球', 'zh-mo': '地球', 'gsw': 'Erde', 'pcd': 'Tière', 'min': 'Bumi', 'sn': 'Rinopasi', 'tyv': 'Чер', 'tn': 'Lefatshe', 'bjn': 'Bumi', 'sr-ec': 'Земља', 'sr-el': 'Zemlja', 'ba': 'Ер', 'bxr': 'Дэлхэй', 'ce': 'Дуьне', 'stq': 'Äide', 'pag': 'Earth', 'got': 'Earth', 'crh-latn': 'Dünya', 'sgs': 'Žemė', 'nan': 'Tē-kiû', 'lzh': '地球', 'vro': 'Maa', 'hi': 'पृथ्वी', 'ie': 'Terra', 'tk': 'Ýer', 'ak': 'Ewiase', 'brh': 'dagaar', 'rup': 'Locu', 'zu': 'Umhlaba', 'na': 'Eb', 'lzz': 'Kiana', 'mai': 'पृथ्वी', 'kaa': 'Jer', 'av': 'Ракь', 'mrj': 'Мӱлӓндӹ', 'sd': 'زمين', 'lrc': 'جأهوٙن', 'tet': 'Rai', 'azb': 'یئر', 'bho': 'पृथ्वी', 'kab': 'Tagnit', 'arq': 'أرض', 'ts': 'Misava', 'de-at': 'Erde', 'sma': 'eatneme', 'ady': 'ЧIыгу', 'jam': 'Oert', 'tcy': 'ಬೂಮಿ', 'ki': 'Thi', 'srn': 'Grontapu', 'gom': 'धर्तरी', 'tg-cyrl': 'Замин', 'tt-cyrl': 'Җир', 'tt-latn': 'Cir', 'lfn': 'Tera', 'crh': 'Dünya', 'atj': 'Aski', 'lg': 'Ensi', 'inh': 'Лаьтта', 'udm': 'Музъем', 'kbp': 'Tɛtʋ', 'dty': 'पृथ्वी', 'mi': 'Whenua', 'co': 'Terra', 'sat': 'ᱫᱷᱟ.ᱨᱛᱤ', 'arn': 'Mapu', 'shn': 'Earth', 'hyw': 'Երկիր', 'zh-my': '地球', 'nqo': 'ߘߎ߱'}

Here the error
{"error":{"code":"not-recognized-language","info":"The supplied language code was not recognized.","messages":[{"name":"wikibase-api-not-recognized-language","parameters":[],"html":{"*":"The supplied language code was not recognized."}}],"*":"See http://localhost:8181/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."}}

Strangly when I insert one after the other, it takes much more time but it is working!?!?! It is very wired, maybe I'm making something wrong?!?! Maybe it is a problem of the wikibase API itself? I was trying to find the api but I could only find wbsetlabel which sets just one label for one language ....

This does seem odd, this should be investigated.
There was some recent ish work on languages within wikibase and why things are accepted / not accepted so this could be a bug from changes there.
Also, this error message is less than perfect, if there is a not recognized language code, really it should say that.

Addshore renamed this task from wikibase-> importing entity from wikidata "The supplied language code was not recognized." to "The supplied language code was not recognized." when loading data from wikidata into wikibase.Oct 29 2019, 7:56 AM
Addshore added a project: Wikidata-Campsite.

Strangly when I insert one after the other, it takes much more time but it is working!?!?!

Did you definitely manage to go through every single language code?
It could also be some configuration difference, that perhaps we should fix (some language code being enabled in wmf production, but not by default on a wikibase install)

Hi,

I reduced it down, apparently the language code that makes problems is:

{"labels": { "nqo": {"language": "nqo", "value": "ߘߎ߱"}}}

And this language code is not available in the options of wbsetlabel of my local wikibase.

So this should be the source of the problem .....

When importing the sitelinks of Earth I get a similar problem:

Language 'ban' does not exist in family wikipedia

Looking specifically at nqo, support for it was added to wikidata in T233835
In terms of mediawiki config it seems the language got added to this langlist file / config https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/539162/3/langlist which must make its way into the wikibase list of languages.
The only place I can find the file loaded is https://github.com/wikimedia/operations-mediawiki-config/blob/c65fdd4947dd163d3068987bc5b8bffeb117b1c6/wmf-config/CommonSettings.php#L954
Per https://www.mediawiki.org/wiki/Extension:SiteMatrix#Configuration this is "The path to a list of language codes recognised by MediaWiki".

So this is why these languages exist on wikidata but not in wikibase by default.

My comment at T220798#5106914 touches on this issue and also here T220798#5177954
I won't merge these those as I think these tasks are different.

You should be able to add new language codes using https://www.mediawiki.org/wiki/Manual:$wgExtraLanguageCodes and or https://www.mediawiki.org/wiki/Manual:$wgExtraLanguageNames and this should (not tested) make wikibase also pick up the new lang codes.

Let me know if this works.
Would you expect wikibase to come with all of the languages supported on wikidata.org out of the box?