Page MenuHomePhabricator

For geolocation in Compact Language Links, consider the languages of the wiki that are most frequently visited in a country
Closed, ResolvedPublic

Description

This idea comes from the Russian Wikipedia discussion about Compact Language Links: the territory-language data CLDR doesn't necessarily correspond to the popularity of Wikimedia projects in each country. In Russia, for example, English is not listed as one of the languages in the CLDR data, but the English Wikipedia is the second most popular edition according to the per-country reading breakdown at stats.wikimedia.org (albeit with modest 6%).

So the data from CLDR can be augmented by using the statistics about site visits, stats.wikimedia.org (or hopefully, something more modern and detailed).

Event Timeline

Amire80 created this task.Aug 4 2016, 7:40 AM
Restricted Application added a project: UniversalLanguageSelector. · View Herald TranscriptAug 4 2016, 7:40 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I suggest closing invalid, this is handled upstream.

Do you mean CLDR?

Nemo_bis closed this task as Resolved.EditedAug 15 2016, 11:10 AM
Nemo_bis claimed this task.
Nemo_bis triaged this task as Low priority.

I think a big part of this will be taken care by the CLDR ticket for EU http://unicode.org/cldr/trac/ticket/9680 .

However I went through https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm up to Slovene (sl) (0.03% share of global total traffic) and I filed some additional reports so I'm marking this resolved, but it would be great if you helped by adding some sources:

The biggest/most obvious cases are already covered by CLDR, such as: pt in Angola, sv in Finland, ko in China, hu in Romania, Slovakia, Serbia and Ukraine, uk in Poland, el in Cyprus, ms in Singapore and Indonesia, et in Finland.

We might want to check if some data for Ireland is a glitch, maybe caused by Facebook servers (over 10 % of the traffic to Bengali and Tagalog Wikipedia comes from Ireland). Someone not afraid of statistical errors in the least trafficked sites may also check further down in the list.

Finally, it would be useful to look for reliable sources for data such as asian diasporas, L2 speakers numbers and distribution.

Thank you so much for taking care of this upstream, @Nemo_bis!