Page MenuHomePhabricator

[EPIC] [PAB3] Build A/B test for language detection on the Portal to reorder and resort the primary links on the page to suit the user better
Closed, ResolvedPublic

Description

Build an A/B test that will test a feature which detects the user's language and re-sorts the links around the globe on the Wikipedia portal in line with their preferred language settings. The user's preferred language will be displayed in the top left link.

If they do not have as many language preferences as there are available links to display, fill the remainder links around the globe with the "top" links that are not in their language preferences.

If they do not have a header (lang pref) or we cannot retrieve it, do a full stop and give them the default experience.

We also want to count how many users go directly to the search box - with or without a language preference set in their browser.

Sample logic should look like the following:

  • 1 in 200 people are included in EventLogging
  • Of those 1 in 200 people, 1 in 10 are included in the test
  • Of those 10 people, 5 go in a test group, with the cohort "language-detection-b", and 5 go in a control group, with the cohort name "language-detection-a"
  • The other chunk of the 200 people gets a NULL (the string null, or the MySQL null, we can detect either).

Event Timeline

Ironholds raised the priority of this task from to Needs Triage.
Ironholds updated the task description. (Show Details)
Ironholds added a project: Wikimedia-Portals.
Ironholds subscribed.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript
Ironholds_backup moved this task from Maps to WDQS on the Discovery-ARCHIVED board.
Ironholds_backup moved this task from WDQS to Analysis on the Discovery-ARCHIVED board.
Ironholds_backup moved this task from Analysis to UX on the Discovery-ARCHIVED board.
Deskana renamed this task from Build A/B test for language detection on the Portal to [EPIC] Build A/B test for language detection on the Portal.Jan 19 2016, 8:26 PM
Deskana triaged this task as High priority.
Deskana added a project: Epic.
Deskana subscribed.

Let's use this as the epic for coordinating the test.

Deskana renamed this task from [EPIC] Build A/B test for language detection on the Portal to [EPIC] Build A/B test for language detection on the Portal to reorder and resort the primary links on the page to suit the user better.Jan 19 2016, 8:31 PM
debt renamed this task from [EPIC] Build A/B test for language detection on the Portal to reorder and resort the primary links on the page to suit the user better to [EPIC] [PAB2] Build A/B test for language detection on the Portal to reorder and resort the primary links on the page to suit the user better.Jan 19 2016, 11:25 PM
debt renamed this task from [EPIC] [PAB2] Build A/B test for language detection on the Portal to reorder and resort the primary links on the page to suit the user better to [EPIC] [PAB3] Build A/B test for language detection on the Portal to reorder and resort the primary links on the page to suit the user better.Jan 20 2016, 8:27 PM

Based upon a question that came up earlier today, I confirmed with @Ironholds that this test will track (and count) the user's preferred language but will not track how many times the links around the globe will be re-arranged. That particular information (about the re-arranged links) can be inferred from how many preferred languages are not English.

Based upon a question that came up earlier today, I confirmed with @Ironholds that this test will track (and count) the user's preferred language but will not track how many times the links around the globe will be re-arranged. That particular information (about the re-arranged links) can be inferred from how many preferred languages are not English.

I don't agree.

Here's my use case:

I am French and my preferred Wikipedia is the french wikipedia.
However, my preferred language for browsing is english.

My preferred languages are set as: english, french. (english first, french second)
As a user, I expect to see the page www.wikipedia.org in english.
But as a user, I will most likely click on the french wikipedia because I like to read the encyclopedia in my native language.

So about me as a user, the data collection will basically look like:

  • My preferred language is french when I clicked on french wiki
  • My preferred language is english when I clicked on english wiki
  • My preferred language is ??? when I didn't click on a wiki

A data analysis leading to the conclusion that my preferred language is french is wrong.

This is why I am saying we are currently NOT collecting the user's preferred language.

The data we collect at best gives us an incorrect representation of people's preferred language...
At best the data we collect gives us a representation of people's preferred SEARCH/LEARNING language (preferred encyclopedia language), NOT the preferred BROWSING language (the language expected for www.wikipedia.org).

If we wanted to collect the user's preferred BROWSING languages, we would just log navigator.language or navigator.languages to collect the list of preferred languages (in preference order).

That particular information (about the re-arranged links) can be inferred from how many preferred languages are not English.

I don't agree.

Couple examples:

Original Top10: English / Japanese / Spanish / Deutsch

1/ Preferred languages: english, french
Top10 is re-arranged: English / FRENCH / Japanese / Spanish

When I click on English wiki, you will consider "english" is my preferred language, thus not know that the top10 was re-arranged.

1/ Preferred languages: english, japanese
Top10 is not re-arranged (order matches preferred languages): English / Japanese / Spanish / Deutsch

When I click on Japanese wiki, you will consider "japanese" is my preferred language, thus consider that the top10 was re-arranged. But it wasn't.

I don't understand your example. If you'd rather read the French Wikipedia, then your preferred language is...French. For the purposes of browsing Wikipedia, at least.

It would be helpful if you could point out solutions here as well as issues?

I also am a bit confused by your example.

This test is to detect the preferred language - just the first one, even if there are more - and put that language link in the top left hand link position around the globe.

I realize that some people will have multiple languages in their 'preferred' language list, but I don't think the majority of people that use the portal site will have more than one language.

As someone who has more than 1 language set in my browser (en, pl) I can relate to @JGirault's use case.

First off I'll like to state that event-logging already collects users preferred language by default. The 'event_accept_language' column in event-logging uses the first value from the http 'accept language' request header to set that value. That value is derived from the users browser settings. The same settings we're using to move the links around (though we're using all the values, not just the first).

In Julien's case, the 'event_accept_language' value would be 'en'. However, because Julien clicks on the french wikipedia, we would interpret his 'real' preferred language as 'fr'. Whether or not we call this the 'preferred' or 'search/learning' language is just semantics to me.

With the A/B as it is though, we would be gathering the same data for Julien in both the control group and the test group, because french already exists in the top ten, and we don't know if moving it around actually helped him click on the french link or not (unless he didn't click on any links in the control group, but then clicked on the link in the test group).

On a global level though, we will be able to tell if there was a general increase in the clickthrough rate of the top-ten, which we can then attribute to moving the links around and adding different languages to it. This was my initial understanding of the goal of this test, and I still think that's in line with our general goal of increasing engagement on this page.

After chatting with @Jdrewniak - I think we're on the right track here. To summarize:

  1. We'll capture the user's browser's preferred language(s)
  2. We'll re-organize the links around the globe to display the preferred language(s) starting at the top left most position.
  3. If there is more than one preferred language, we'll slot that into the second position and so on.
  4. Existing event logging will continue as is - with only the first preferred language tracked.
  5. We'll also continue the existing logging of the user's clickthrough from the portal.

We're especially interested in if a user's preferred language is EN but they click on the Espanol link to get to that wiki.

@Ironholds and @mpopov please let us know if that is good from an analysis standpoint.

After chatting with @Jdrewniak - I think we're on the right track here. To summarize:

  1. We'll capture the user's browser's preferred language(s)
  2. We'll re-organize the links around the globe to display the preferred language(s) starting at the top left most position.
  3. If there is more than one preferred language, we'll slot that into the second position and so on.
  4. Existing event logging will continue as is - with only the first preferred language tracked.

Currently we track all primary links. Can you explain what you mean by "only the first preferred language tracked"?

  1. We'll also continue the existing logging of the user's clickthrough from the portal.

We're especially interested in if a user's preferred language is EN but they click on the Espanol link to get to that wiki.

Personally, I think we should be interested in all preferred languages. I don't see why we should limit it to 1.
The above plan only cares about 1 preferred language, not the full list.

I have one related concern with the above plan. If my preferred languages are EN and FR, and I click on ES, you:

  • will know that I was interested in ES instead of my most preferred language EN
  • can determine if ES was in initial list of top10 wikis.
    • if yes, you don't know if ES was in my list of preferred languages (as secondary, tertiary...)
    • If not, you know that ES was in my list of preferred languages (as secondary, tertiary...).

With my point in bold, you don't know whether the A/B test had an impact on clickthrough or not.

@Ironholds and @mpopov please let us know if that is good from an analysis standpoint.

hey @JGirault,
by "only the first preferred language tracked" I think @debt is referring to the 'event_accept_language' column in event-logging, which only uses the first language in the 'accept language' header.

I agree that it would be good if we tracked all the users preferred languages. To do that we would have to modify the event-logging schema, which might have some consequences for analytics.

However, I just did some research and discovered that even if we log the equivalent of navigator.languages, that might not be entirely useful. I created a new user on my machine with polish as their language, then I downloaded the polish language firefox and chrome.
I noticed that by default, navigator.languages in firefox is ["pl", "en-US", "en"] and in chrome it's ["pl-PL", "pl", "en-US", "en"].
How did english get in there? it seems that at least these two browsers add english to the preferred language list as a fallback, by default. Given that most people don't change these settings, we would end up logging a lot of 'en' language preferences that aren't really the users preference.

To your third point, it's true that when a user has multiple preferred languages, we won't know for sure if the link the user clicks on was actually one of their preferred language or not. I think we can fix this.

Here's an idea:
We can append a hash like "#preferred" to the end of the clickthough URL, if the language is in the users preferred language list. We don't even have to append the hash to the URL itself, only to the url string when we send the data to event-logging.
That way, as to your example, if your languages are [EN, FR] and you click on FR, we'll log http://fr.wikipedia.org/#preferred and if you click on ES, we'll just log http://es.wikipedia.org as the clickthrough URL.
This will tell us if people are actually clicking on their preferred languages or not, without having to determine if those languages are already on the top ten. If we want to get even more detailed, we could append '#preferred-new' to preferred languages that are not in the top ten.

After the above discussion, we decided to log all of the users browser languages, not just the first one. Turns out the 'event_accept_language' field was not automatically populated by the server, and this change was easy to implement on our end.

@mpopov This an example of the event-logging data we'll be sending for the test. This example excludes data that is automatically populated (uuid, timestamp, clientIp, userAgent). This is a clickthrough event of a user who's in the A/B test 'test' group (as opposed to the control group), the users browser languages set to english & french, and the user clicked on the french wikipedia top-link:

{"event":
  { "session_id":"1d2618c3c2d33601",
    "event_type":"clickthrough",
    "accept_language":"en,fr",
    "cohort":"language-detection-b", 
    "country":"PL",
    "destination":"https://fr.wikipedia.org/", 
    "section_used":"primary links"
   },
   "revision":14377354,
    "schema":"WikipediaPortal",
    "webHost":"www.wikipedia.org",
    "wiki":"metawiki"
};

The language codes will be comma seperated, ISO639-1 formatted, which means for example, english will appear as 'en' instead of 'en-GB' or 'en-US' like it does now. Other than that it's business as usual.

I would however, suggest another slight change.
In the past we inadvertently changed the NULL cohort group -- which is the group that is excluded from the test, but still logged for the dashboards -- from 'null' to 'baseline'. @Ironholds accommodated this change by modifying the dashboard code, and we would like to continue to refer to this group as 'baseline' instead of 'null'. The reason being that the browser localStorage we use to persist the users test group converts the value null to the string 'null', which makes it confusing to test whether the value 'null' was set intentionally or not.

@Jdrewniak Looks good! And yeah, "baseline" or "control" is good, especially given that "null" bug!

As part of this test, we added in this: https://phabricator.wikimedia.org/T124486 to localize the language for The Free Encyclodedia phrase underneath the Wikipedia wordmark.

This A/B test went into production on March 22, 2016 and is expected to run for a week, for data collection and analysis purposes.

It might run a bit longer than a week, due to vacation schedules, but we'll only use the data from the first 7 full days for analysis.

Update: we might need to use more than 7 days to get a good sampling, so we will probably keep this test running for up to a 2 week period. The main issue is that the affected population of interest (people who do not list English as their primary accept-language and people who don't list English at all) is a much smaller subset of both the control and test groups, according to @mpopov. With a smaller amount of overall counts (visitors that hit the test), it would require reworking the sampling logic. We will re-evaluate at the end of the first 7 days of this test - to see if we hit our min counts needed.

debt moved this task from Done to Completed on the Discovery-Portal-Sprint board.

Closing this successful test. It will be promoted to production with this ticket: https://phabricator.wikimedia.org/T133432