Need: As a curious Wikipedian interested in the use of languages online, I would like Wikimedia Statistics (and the Wikimedia REST API) to display results by "big regions" instead of countries only, so that I can have more granularity in my analyses and identify the most used edition by region.
Currently, Wikimedia Statistics offers data by "country" (more exactly, by ISO 3166-1 alpha code, so some jurisdictions that are not countries have their own code). Using this feature (and some code), I updated existing maps showing the most used edition of Wikipedia. Here's the result. It's now used on several Wikipedias. (if you're curious, I wrote an article about this.)
Tomasz Kamusella, a researcher whose focus is the use of languages in cyberspace, said that this map lacked granularity:
- "The picture of the situation could be delivered in an improved 'resolution' if India's regional states and China's or Russia's autonomous republics could be treated as separate entities. Otherwise, we have a unit on the map for Malta with 0.4m inhabitants, but not for West Bengal with 180m inhabitants..."
- "granularity in data presentation is a problem when only states are employed as a standard unit for this purpose. Hence, it is important to think about developing regional/state maps for such regions/states (South Asia/India), if they contain a quarter of the world's population. Otherwise, users can see exact and finely tuned data on Belize, Malta or Slovenia, but not on West Bengal within India."
I agree with Dr. Kamusella: millions of users in India, Pakistan, Bangladesh (and elsewhere), and their languages, are underrepresented in the current version of Wikimedia Statistics, and in the resulting map(s). Languages spoken in this region are also quickly growing so it's important to have a correct picture of their use.
One solution could be to use ISO 3166-2 codes on the API. Actually, because Wikimedia Statistics uses ISO 3166-1, some subdivisions of ISO 3166-2 are already supported (Svalbard, Aruba, Puerto Rico, French Polynesia, etc.).
This feature could first be introduced only for "big countries" with well-established subdivisions, where it may be easier:
- US states,
- Canadian provinces,
- Russian Republics,
- Indian states,
- Chinese provinces (even though the ban of Wikipedia may render this feature useless there)
(FYI: I initially mentioned the idea of such a feature request here: T257071 ).
