Page MenuHomePhabricator

Create harness for A/B test for logged in users
Closed, ResolvedPublic3 Estimated Story Points

Description

Spun out of T275807

Readers Web will be A/B testing the existing treatment of the language switcher and a new treatment being worked on as part of Desktop Improvements. The initial cohort will be for logged-in users only

AC

  • We should bucketed on a user's centralized ID (global ID) so that they remain consistently bucketed across sites (e.g. If I switch from English to French I should be bucketed the same)
  • When I visit a wiki with the A/B test enabled, I receive either the existing treatment or the new one
  • When I visit a wiki with the A/B test enabled and I have the magic query string parameter set then it should have the following effects:
ValueEffect
undefinedI'm entered into the A/B test and bucketed as usual
controlI see the existing treatment
AI see the new treatment

Developer Notes

  1. Bucketing should happen on the server side
    • Would adding a class to the body tag to help with bucketing
  2. We should bucketed on a user's centralized ID (global ID) so that they remain consistently bucketed across sites…

This change be achieved with the following:

$lookup = CentralIdLookup::factoryNonLocal();
$id = null;
if ( $lookup ) {
  $id = $lookup->centralIdFromLocalUser( $user );
}
// The central ID lookup failed?
if ( !$id ) {
  $id = $user->getId();
}

QA Steps

With Even Logged in User

  1. Visit https://patchdemo.wmflabs.org/wikis/51ef0c9126/wiki/Tree
  2. Login with username: alice and password patchdemo1
  3. In dev console, run mw.user.getId(); ensuring that user has an even user id.
  4. Assert that language button appears and languages do NOT appear in sidebar.
  5. Visit https://patchdemo.wmflabs.org/wikis/51ef0c9126/wiki/Tree?languageinheader=0
  6. Assert that languages appear in the sidebar and that the language button does NOT appear

With Odd Logged in User

  1. Visit https://patchdemo.wmflabs.org/wikis/51ef0c9126/wiki/Tree
  2. Login with username: bob and password patchdemo1
  3. In dev console, run mw.user.getId(); ensuring that user has an odd user id.
  4. Assert that languages appear in the sidebar and that the language button does NOT appear
  5. Visit https://patchdemo.wmflabs.org/wikis/51ef0c9126/wiki/Tree?languageinheader=1
  6. Assert that language button appears and languages do NOT appear in sidebar.

With Anonymous User (and languageinheader config off for anons)

  1. Logout and assert that you are an anonymous user.
  2. Visit https://patchdemo.wmflabs.org/wikis/51ef0c9126/wiki/Tree
  3. Assert that languages appear in the sidebar and that the language button does NOT appear

QA Results - Beta

QA Results - Prod

ACStatusDetails
1T280825#7146336
2T280825#7146336
3T280825#7146336 need more info
4T280825#7146336
5T280825#7146336

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jdlrobson updated the task description. (Show Details)
phuedx updated the task description. (Show Details)

Change 683451 had a related patch set uploaded (by Nray; author: Nray):

[mediawiki/skins/Vector@master] Create A/B test harness for Language in header feature

https://gerrit.wikimedia.org/r/683451

nray removed nray as the assignee of this task.Apr 29 2021, 12:57 AM
nray added a subscriber: nray.
nray removed nray as the assignee of this task.May 4 2021, 2:32 AM
nray assigned this task to phuedx.
nray removed nray as the assignee of this task.May 4 2021, 7:04 PM
nray assigned this task to phuedx.
nray removed phuedx as the assignee of this task.
nray assigned this task to phuedx.

Change 683451 merged by jenkins-bot:

[mediawiki/skins/Vector@master] Create A/B test harness for Language in header feature

https://gerrit.wikimedia.org/r/683451

nray claimed this task.
nray moved this task from Code Review to QA on the Readers-Web-Backlog (Kanbanana-FY-2020-21) board.
nray added a subscriber: Edtadros.

I need to add QA steps

Change 685944 had a related patch set uploaded (by Nray; author: Nray):

[mediawiki/skins/Vector@master] DNM: Config for QA of T280825 via PatchDemo

https://gerrit.wikimedia.org/r/685944

nray updated the task description. (Show Details)

Change 685944 abandoned by Nray:

[mediawiki/skins/Vector@master] DNM: Config for QA of T280825 via PatchDemo

Reason:

https://gerrit.wikimedia.org/r/685944

Status: ✅ PASS
Environment: beta
OS: macOS Big Sur
Browser: Chrome
Device: MBP
Emulated Device: NA

Test Artifact(s):

QA Steps

With Even Logged in User

  1. Visit https://patchdemo.wmflabs.org/wikis/51ef0c9126/wiki/Tree
  2. Login with username: alice and password patchdemo1
  3. In dev console, run mw.user.getId(); ensuring that user has an even user id.
  4. Assert that language button appears and languages do NOT appear in sidebar.

✅ AC1:

Screen Shot 2021-05-19 at 6.57.07 PM.png (1×1 px, 693 KB)

  1. Visit https://patchdemo.wmflabs.org/wikis/51ef0c9126/wiki/Tree?languageinheader=0
  2. Assert that languages appear in the sidebar and that the language button does NOT appear

✅ AC2:

Screen Shot 2021-05-19 at 6.58.24 PM.png (1×1 px, 737 KB)

With Odd Logged in User

  1. Visit https://patchdemo.wmflabs.org/wikis/51ef0c9126/wiki/Tree
  2. Login with username: bob and password patchdemo1
  3. In dev console, run mw.user.getId(); ensuring that user has an odd user id.
  4. Assert that languages appear in the sidebar and that the language button does NOT appear

✅ AC3:

Screen Shot 2021-05-19 at 6.59.36 PM.png (1×1 px, 736 KB)

  1. Visit https://patchdemo.wmflabs.org/wikis/51ef0c9126/wiki/Tree?languageinheader=1
  2. Assert that language button appears and languages do NOT appear in sidebar.

✅ AC4:

Screen Shot 2021-05-19 at 7.00.15 PM.png (1×1 px, 694 KB)

With Anonymous User (and languageinheader config off for anons)

  1. Logout and assert that you are an anonymous user.
  2. Visit https://patchdemo.wmflabs.org/wikis/51ef0c9126/wiki/Tree
  3. Assert that languages appear in the sidebar and that the language button does NOT appear

✅ AC5:

Screen Shot 2021-05-19 at 7.01.32 PM.png (1×1 px, 676 KB)

Status:
Environment: beta
OS: macOS Big Sur
Browser: Chrome
Device: MBP
Emulated Device: NA

Test Artifact(s):

QA Steps

With Even Logged in User

  1. Visit https://en.wikipedia.org/wiki/Albert_Einstein
  2. Login with an even numbered user
  3. In dev console, run mw.user.getId(); ensuring that user has an even user id.
  4. Assert that language button appears and languages do NOT appear in sidebar.

✅ AC1:

Screen Shot 2021-06-09 at 8.45.03 AM.png (1×1 px, 684 KB)

  1. Visit https://en.wikipedia.org/wiki/Albert_Einstein?languageinheader=0
  2. Assert that languages appear in the sidebar and that the language button does NOT appear

✅ AC2:

Screen Shot 2021-06-09 at 8.46.28 AM.png (1×1 px, 754 KB)

With Odd Logged in User

  1. Visit https://en.wikipedia.org/wiki/Albert_Einstein
  2. Login with an odd numbered user
  3. In dev console, run mw.user.getId(); ensuring that user has an odd user id.
  4. Assert that languages appear in the sidebar and that the language button does NOT appear

❌ AC3:

Screen Shot 2021-06-09 at 8.58.32 AM.png (1×1 px, 726 KB)

  1. Visit https://en.wikipedia.org/wiki/Albert_Einstein?languageinheader=1
  2. Assert that language button appears and languages do NOT appear in sidebar.

✅ AC4:

Screen Shot 2021-06-09 at 8.59.47 AM.png (1×1 px, 728 KB)

With Anonymous User (and languageinheader config off for anons)

  1. Logout and assert that you are an anonymous user.
  2. Visit https://patchdemo.wmflabs.org/wikis/51ef0c9126/wiki/Tree
  3. Assert that languages appear in the sidebar and that the language button does NOT appear

✅ AC5:

Screen Shot 2021-06-09 at 9.01.01 AM.png (1×1 px, 765 KB)

@ovasileva I should have tested this long ago. I'm curious if the issue with AC3 is due to a change in subsequent functionality vs. a defect. I'm leaving this as ❓ instead of ❌ for now.

This comment was removed by ovasileva.

A/B test is now live on all wikis except fawiki. @MNeisler - how should we go about QA of the bucketing?

As we discussed in yesterday's Web task sync I've run a query to see if the treatments are balanced:

[0]

+--------+----------+----------------+---------+---------------+
|   n    | n_header | percent_header | n_other | percent_other |
+--------+----------+----------------+---------+---------------+
| 432955 |   201984 |          46.65 |  230970 |         53.35 |
+--------+----------+----------------+---------+---------------+

select
  n,
  n_header,
  round(100 * n_header / n, 2) as percent_header,
  n_other,
  round(100 * n_other / n, 2) as percent_other
from
(
  select
    sum(1) as n,
    sum(if(event.context = 'header', 1, 0)) as n_header,
    sum(if(event.context = 'other', 1, 0)) as n_other
  from
    universallanguageselector
  where
    year = 2021
    and month = 6
    and day >= 22
    
    and event.action = 'compact-language-links-open'
    and event.context
) as dataset
;

@phuedx - Do you know if there is any way to test the bucketing on the client side?

It's difficult to conclude the AB buckets are balanced based on the aggregated data because:

(1) we don't log distinct users visits to the site. Assuming an equal and random split of users across groups, we'd expect a roughly similar number of sessions and events ; however, the changes in the language switching feature itself may drive some differences in the number of sessions logged.
(2) Unlike the search AB test where search sessions were initiated the same way (by typing in the search widget), there are different ways to access the language switching features in each group. In the treatment group, you click the new menu button to access all language switching features and in the control group, you can click a language link in the sidebar, a settings cog, or the N more button. This makes direct comparisons between "initiated sessions" difficult.

If not, the best we can likely do is try to compare the number of unique sessions recorded in each test group for a similar action such as clicks to a language feature or language switches (similar to your check in T280825#7177124) and see if there are any significant differences that might indicate an issue in bucketing.

(cc @ovasileva)

I looked at this with @Edtadros and from my understanding of the code, we can't rely on the user ID being odd or even to check bucketing, as actually the code uses a CentralAuth user id when available which is not available for inspection in the client.

As we discussed in yesterday's Web task sync I've run a query to see if the treatments are balanced:

[0]

+--------+----------+----------------+---------+---------------+
|   n    | n_header | percent_header | n_other | percent_other |
+--------+----------+----------------+---------+---------------+
| 432955 |   201984 |          46.65 |  230970 |         53.35 |
+--------+----------+----------------+---------+---------------+

select
  n,
  n_header,
  round(100 * n_header / n, 2) as percent_header,
  n_other,
  round(100 * n_other / n, 2) as percent_other
from
(
  select
    sum(1) as n,
    sum(if(event.context = 'header', 1, 0)) as n_header,
    sum(if(event.context = 'other', 1, 0)) as n_other
  from
    universallanguageselector
  where
    year = 2021
    and month = 6
    and day >= 22
    
    and event.action = 'compact-language-links-open'
    and event.context
) as dataset
;

Hi @MNeisler the compact-language-links-open event is only fired when the language button is clicked. For this reason, isn't this a bad indicator for whether the treatments were balanced? To test whether the treatments are balanced wouldn't it be better to use an event which is fired on startup for all pages views? (I don't think we have this)

If anything doesn't this indicate to us, that if the language is in the header, it's less likely to be clicked by logged in users (which could make sense with discoverability)

Hi @MNeisler the compact-language-links-open event is only fired when the language button is clicked. For this reason, isn't this a bad indicator for whether the treatments were balanced? To test whether the treatments are balanced wouldn't it be better to use an event which is fired on startup for all pages views? (I don't think we have this)

@Jdlrobson - the compact-language-links-open event is also fired when the "N more" button is clicked on the sidebar by users in the control group. We can decipher between these two events using the event.context field.

However, I agree it's still a bad indicator if the treatments are balanced because there are several different ways a session is initialized in the control group: (1) clicking the N more button, (2) clicking a lang link in the sidebar or (3) clicking the settings cog. While in the treatment group, everyone has to click the new language button to access all of these features. As a result, there's not really a good way to directly compare the two groups based on the total number of logged sessions alone since the changes to the feature itself might be driving some of the differences between the two groups we see.

To test whether the treatments are balanced wouldn't it be better to use an event which is fired on startup for all pages views? (I don't think we have this)

Yes, we don't have this so we really can't determine if the buckets are balanced based on the aggregated data alone. I can only confirm that we are logging all expected events from each group and events we're seeing for logged-in and logged-out users appear roughly as expected based on the AB test deployment dates. If not already done and still possible, I'd recommend double-checking on the client-side that a user bucketed into the control or treatment group sees all the correct features and events fire as expected.

. If not already done and still possible, I'd recommend double-checking on the client-side that a user bucketed into the control or treatment group sees all the correct features and events fire as expected.

I don't think this is possible more than the testing we have run already per my comment above - we can't verify the ID used on the client. That said, there's very little that could go wrong.

If we want confidence in the A/B bucketing, perhaps we could retroactively add an init event to verify this that we could run after the A/B test?

Jdlrobson assigned this task to ovasileva.

I've opened up next steps in T286932
@ovasileva can this now be closed?