Maniphest T142070

For geolocation in Compact Language Links, consider the languages of the wiki that are most frequently visited in a country
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Amire80
	Aug 4 2016, 7:40 AM

Description

This idea comes from the Russian Wikipedia discussion about Compact Language Links: the territory-language data CLDR doesn't necessarily correspond to the popularity of Wikimedia projects in each country. In Russia, for example, English is not listed as one of the languages in the CLDR data, but the English Wikipedia is the second most popular edition according to the per-country reading breakdown at stats.wikimedia.org (albeit with modest 6%).

So the data from CLDR can be augmented by using the statistics about site visits, stats.wikimedia.org (or hopefully, something more modern and detailed).

Event Timeline

Amire80 created this task.Aug 4 2016, 7:40 AM

Restricted Application added a project: UniversalLanguageSelector. · View Herald TranscriptAug 4 2016, 7:40 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I suggest closing invalid, this is handled upstream.

Do you mean CLDR?

Yes. You can learn more from the FAQ: https://www.mediawiki.org/wiki/ULS/FAQ#language-territory

Jack_who_built_the_house subscribed.Aug 9 2016, 7:51 PM

I think a big part of this will be taken care by the CLDR ticket for EU http://unicode.org/cldr/trac/ticket/9680 .

However I went through https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm up to Slovene (sl) (0.03% share of global total traffic) and I filed some additional reports so I'm marking this resolved, but it would be great if you helped by adding some sources:

Thai http://unicode.org/cldr/trac/ticket/9698
Norwegian http://unicode.org/cldr/trac/ticket/9699
Danish http://unicode.org/cldr/trac/ticket/9700
Tagalog/Filipino http://unicode.org/cldr/trac/ticket/9701
Serbian http://unicode.org/cldr/trac/ticket/9702
USA http://unicode.org/cldr/trac/ticket/9703

The biggest/most obvious cases are already covered by CLDR, such as: pt in Angola, sv in Finland, ko in China, hu in Romania, Slovakia, Serbia and Ukraine, uk in Poland, el in Cyprus, ms in Singapore and Indonesia, et in Finland.

We might want to check if some data for Ireland is a glitch, maybe caused by Facebook servers (over 10 % of the traffic to Bengali and Tagalog Wikipedia comes from Ireland). Someone not afraid of statistical errors in the least trafficked sites may also check further down in the list.

Finally, it would be useful to look for reliable sources for data such as asian diasporas, L2 speakers numbers and distribution.

Nemo_bis added a project: Upstream.Aug 15 2016, 11:11 AM

Thank you so much for taking care of this upstream, @Nemo_bis!

Amire80 moved this task from Backlog to Compact Language Links on the UniversalLanguageSelector board.Nov 1 2017, 2:56 PM

Amire80 moved this task from Backlog to Prioritised languages on the ULS-CompactLinks board.Nov 3 2017, 8:24 AM

For geolocation in Compact Language Links, consider the languages of the wiki that are most frequently visited in a countryClosed, ResolvedPublicActions

Description

Event Timeline

For geolocation in Compact Language Links, consider the languages of the wiki that are most frequently visited in a country
Closed, ResolvedPublic
Actions