Page MenuHomePhabricator

Determine frequency of usage of language variants
Open, LowPublic

Description

Background

We would like to learn more about how people use language variants. In particular:

  • How often are language variants used per session?
  • How does this compare to language switching (using the ULS) overall?

Wikis with language variants:

  • Chinese Wikipedia
  • Serbian Wikipedia
  • Kazakh (kk) Wikipedia
  • Kurdish (ku) Wikipedia
  • Uzbek (uz) Wikipedia
  • Tajik (tg) Wikipedia

Event Timeline

ovasileva raised the priority of this task from Medium to High.Sep 8 2020, 1:47 PM
VulpesVulpes825 added a subscriber: VulpesVulpes825.

Adding MediaWiki-Language-converter since it is the part of the core that is responsible for language conversion. Adding Platform Engineering assume they are the current maintainer of language converter.

How often are language variants used per session

Due to T54429, Google serves both language-variant URL and Canonical URL randomly in the search result. This will make any research conclusion of language variant usage for logged-out user inaccurate and unusable. For example, Chinese Wikipedia will serve a warning when content language of a page is in zh-Hans, zh-Hant or unconverted to avoid this issue.

How does this compare to language switching (using the ULS) overall?

I am not quite understand the purpose of this question, as ULS does not control Language Converter and content language setting. For example, if you set ULS to zh-Hans-CN, any page on Chinese Wikipedia will still be unconverted unless you set the content language in setting or selected manually zh-Hans-CN in the variant drop down menu on page (See T224701 for detail). In other words, unless ULS takes over the content language setting of the language converter, the comparison is meaningless. But, since it is in the description, adding correspondent UniversalLanguageSelector and Language-Team

I would suggest adding analysis of user percentage of each language variants of each language (such as how many user choose zh-Hans-CN content language setting on Chinese Wikipedia). This analysis can be a little bit helpful for tasks like T250604.

Also, I am a little bit curious about why #Desktop-improvement is involved? If it is for Language switching feature, then what MobileFrontend currently do for the Language variants should be sufficient.

daniel added a subscriber: daniel.

Nothing to do here for Platform Engineering right now, moving to "tracking"

ovasileva lowered the priority of this task from High to Medium.Oct 29 2020, 8:04 PM

OK, I can try to reply.

It's difficult to give a precise number here. The only languages from this for which we can have useful numbers are Chinese and Serbian. The rest are (unfortunately!) so small that statistics about them are negligible.

The most relevant metric we have for this in ULS at the moment is the count of searches for a language in the ULS search box that didn't yield any result. For the last three months, this was logged 1,678,842 times. Out of this number, this is the count of searches for these languages:

  • Traditional Chinese: 2300 (~0.15%)
  • Simplified Chinese: 2400 (~0.14%).
  • Serbian Cyrillic: 700 (0.04%)
  • Serbian Latin: 6400 (0.38%; Latin is more common in real life)

These percents may look small, but do remember that we have more than 300 languages (and people also search for languages we don't [yet] support!). To me this indicates that there are enough people that are trying to use ULS to switch variants, and to consider making it possible to use ULS for that.

All the above DOESN'T include switching language variants using something other than ULS: the tab at the top of the page, the user preferences (relevant only for logged-in users), and simply coming from search results to page in the relevant variant. The latter method is probably the most common way to arrive at variants (both default and non-default), but I don't know what's the best way to measure that.

Thanks, @Amire80!

All the above DOESN'T include switching language variants using something other than ULS: the tab at the top of the page, the user preferences (relevant only for logged-in users), and simply coming from search results to page in the relevant variant. The latter method is probably the most common way to arrive at variants (both default and non-default), but I don't know what's the best way to measure that.

If we want to know how often different variants are used, the pageview_hourly stream looks like it breaks down view by language variant. But it sounds like this task is about how often people switch once they're in the Wikipedia ecosystem, so that's not as relevant.

So, since it sounds like switching variants is not instrumented, the options are probably:

  1. Set up new instrumentation for this.
  2. Look through the webrequest stream for entries where the request URL shows one variant but the referrer URL shows another. It wouldn't let you connect switches to a particular session, but it should allow you to get the overall number.

From the description:

How does this compare to language switching (using the ULS) overall?

@MNeisler does this mean switching from one project to another using the interlanguage links or changing the interface language within one project using ULS? The stats Amir shared address the latter, and the full data is available in the ULS stream. If it's it's the former, I know Amir collects data on interlanguage link use but I'm not sure where he gets it.

So, since it sounds like switching variants is not instrumented, the options are probably:

  1. Set up new instrumentation for this.

This is a good thing to do in any case. It's an important feature for several languages, and it was never properly researched, and it should be.

From the description:

How does this compare to language switching (using the ULS) overall?

@MNeisler does this mean switching from one project to another using the interlanguage links or changing the interface language within one project using ULS? The stats Amir shared address the latter, and the full data is available in the ULS stream. If it's it's the former, I know Amir collects data on interlanguage link use but I'm not sure where he gets it.

It's here: https://language-reportcard.wmflabs.org/interlanguage/#desktop . Note that it's desktop only. There really should be such a thing for mobile platforms, too, and more generally, there should be more technology sharing on language switching (and variant switching) between desktop and mobile platforms.

Thanks @Amire80 and @nshahquinn-wmf!

So, since it sounds like switching variants is not instrumented, the options are probably:
Set up new instrumentation for this.
Look through the webrequest stream for entries where the request URL shows one variant but the referrer URL shows another. It wouldn't let you connect switches to a particular session, but it should allow you to get the overall number.

Per discussions with @ovasileva, we're going to plan to look through the webrequest stream to determine the overall number of switches from one variant to another. While this won't give us per session data, it will give us a basic understanding of overall usage frequency pending new instrumentation. Moving this to upcoming quarter based on current team priorities.

ovasileva lowered the priority of this task from Medium to Low.Sep 2 2021, 4:43 PM