Zotero gives us back an unvalidated language values which may point to the language of the metadata, of the source, or both. The value is typically a language code of some sort although it can also be text i.e. "English", "Francais". We need to resolve this language value in three ways:
- language -> wikidata item (for statement)
generated table of a limited set of these, as the full space of possibilities is larger than the number of valid mediawiki language codes: https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all
may want to pre-compute these a la: https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/master/repo/maintenance/updateUnits.php
- language -> content language (for label)
- language -> monolingual snak language (for metadata)
At present, this does not give us a complete way to handle languages. If the code is not a valid code as a content language or monolingual snak we need to set a fallback language and I'm not sure how to do this.
At present we are using openrefine to get the wikidata item for statements, which is slow, example: https://tools.wmflabs.org/openrefine-wikidata/en/api?query=%7B%22query%22%3A%22hu%22%2C%22limit%22%3A1%2C%22type%22%3A%5B%22Q1288568%22%2C%22Q33742%22%2C%22Q951873%22%2C%22Q33384%22%2C%22Q34770%22%2C%22Q1002697%22%5D%2C%22type_strict%22%3A%22any%22%7D
There are some fallbacks available where there is a wiki associated with a language code available from each wiki's api, example:
But this does not handle the most common direction of having a more specific language code like en-US or fcr and wanting to fall back on en and fr respectively.
See also: T217239
ISO to wikimedia language codes gist: https://gist.github.com/mvolz/1e99234373833838581e558e99904201