Page MenuHomePhabricator

Sticky header: Create A/B test schema and tie to sticky header feature
Closed, ResolvedPublic8 Estimated Story Points

Description

Background

Based on T287709: [SPIKE] Explore instrumentation for sticky header work and subsequent discussion, we will be creating a schema that tracks whether the user is in the control or test bucket of the sticky header A/B test.

Acceptance criteria

  • Setup the A/B test schema.
  • Add a configuration option to Vector for controlling whether the sticky header is shown.
  • When the Vector experiment is enabled, make use of mediawiki.experiments to determine the bucket (mw.experiments.getBucket) and notify WikimediaEvents that an experiment is running (consider calling mw.track('wikimediaEvents.experiments')).
  • The WikimediaEvents extension when notified that an experiment is running should log an event which contains:
    • the user session ID
    • the experiment group the user is assigned to (A or B). This should come from the value of mw.experiments.getBucket
    • the experiment name (which is communicated to it by Vector)
    • the user ID.
  • If no experiment is running the WikimediaEvents should log nothing to the new schema

Developer notes

The mediawiki experiments module in core, provides a lot of the infrastructure needed here.

mw.experiments.getBucket( {
    name: 'My first experiment',
    enabled: true,
    buckets: {
        control: 0.5
        A: 0.25,
        B: 0.25
    }
} )

Notes on schema came from the events defined in the sticky header instrumentation spec

QA steps

  • For testing event logging locally, the following config will be needed in local settings:
$wgVectorWebABTestEnrollment = [
	'name' => 'vector.sticky_header_2021_11',
	'enabled' => true,
	'buckets' => [
		'unsampled' => [
			'samplingRate' => 0.1,
		],
		'control' => [
			'samplingRate' => 0.3,
		],
		'stickyHeaderDisabled' => [
			'samplingRate' => 0.3,
		],
		'stickyHeaderEnabled' => [
			'samplingRate' => 0.3,
		]
	]
];
  • Create some new test users to make sure you have one in each group: control, stickyHeaderDisabled, stickyHeaderEnabled (eventlogging will indicate which bucket the user is in by the group key).
  • For all groups, an init event will be logged on page load (no other time) like the following:

Screen Shot 2021-11-01 at 2.41.11 PM.png (810×2 px, 184 KB)

QA Results - Beta

ACStatusDetails
1T292587#7510832

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Hi @cjming,

wiki (e.g. enwiki) and webhost (e.g. en.wikipedia.org ) are default fields enabled in event logging (ref: https://meta.wikimedia.org/wiki/Schema:EventCapsule). They will be kept in database without any further review. domain and uri are subfields under meta. meta is auto purged after 90 days. To retain meta we need to go through review process. For A/B test schema, we might not need to enable meta field if it takes your effort to do so. Hope it will save your time working on it.

thanks @jwang

wiki (e.g. enwiki) and webhost (e.g. en.wikipedia.org ) are default fields enabled in event logging (ref: https://meta.wikimedia.org/wiki/Schema:EventCapsule). They will be kept in database without any further review.

Apparently EventCapsule is a legacy schema fragment. I submitted the A/B test schema using the modern Event Platform framework so I don't think we get wiki by default.

domain and uri are subfields under meta. meta is auto purged after 90 days. To retain meta we need to go through review process. For A/B test schema, we might not need to enable meta field if it takes your effort to do so. Hope it will save your time working on it.

The common fragment which I included as a reference to the new A/B test schema gives us those meta subfields (it's trivial to remove/add them so no worries about time). I didn't realize meta fields get auto-purged - I'm guessing you want this data to persist beyond 90 days so domain and/or uri will not suffice?

Given that we don't get wiki by default using Event Platform, I'm also guessing that you would still want wiki as an explicit property in the new schema?

Sorry, I didn't know the capsule was out dated and thought wiki was effortless to get. Do you have an example of schema data by modern event platform? I want to explore with real data to confirm. If you don't have example, it's OK. I will ask my team members next week to try their schemas generated by modern event platform.

thought wiki was effortless to get.

It is super easy to get - sorry to suggest otherwise. I just wasn't sure if domain or uri would be sufficient for your queries.

I'll try to dig up documentation (apparently it's not all updated) for you in the meantime

@cjming, I can convert domain into wiki code. But it's cumbersome to work with. The wiki code is the basic key to join all tables together. It would be great if we have a wiki code field ready to join in the schema. Thanks!

sounds good @jwang - i'll keep the wiki property then

@Ottomata could the wiki property we use in the legacy schemas be added upstream? If it serves a purpose, I'd rather we didn't have to add this for every event schema going forward, particularly when it can be derived from domain after injestion

Adding it to the event based wgDBName is kind of a Metrics Platform thing, we'd have to ensure somehow that all events that go through EventLogging have this field, otherwise events with schemas that don't have this field will be invalid after EventLogging sets it.

A normalized_host field is added to all Hive event tables. Example:

normalized_host: {"project_class":"wikipedia","project":"es","qualifiers":[],"tld":"org","project_family":"wikipedia"}

That isn't the same as the wiki database name though, and I do understand that wiki db name is a pretty ubiquitous way of IDing a wiki. Hm. @jlinehan thoughts?

cjming assigned this task to nray.
cjming moved this task from Doing to Code Review on the Web-Team-Backlog (Kanbanana-FY-2021-22) board.
cjming assigned this task to nray.
cjming moved this task from Doing to Code Review on the Web-Team-Backlog (Kanbanana-FY-2021-22) board.

So as not to get stalled out, while we're waiting for final confirmation of new schema naming (see T292586#7458867), I'm moving this ticket and related patches to code review to get jump-started on feedback.

Once names are settled, I'll update all relevant patches accordingly.

hi @ovasileva @jwang @Jdlrobson cc @nray

question about the buckets - am I correct in assuming the following about the test groups for this task/AB test?

  • Control is unsampled (in this case set to 0 because we want to sample all logged in users)
    • logged-in users without sticky header
    • with instrumentation
  • A is group without feature
    • logged-in users without sticky header
    • with instrumentation
  • B is group with feature
    • logged-in users with sticky header
    • with instrumentation

Change 732089 merged by jenkins-bot:

[schemas/event/secondary@master] Add new web A/B test schema to track bucketing of users for a given experiment.

https://gerrit.wikimedia.org/r/732089

Change 732847 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Add web A/B test event logging

https://gerrit.wikimedia.org/r/732847

Change 732848 abandoned by Clare Ming:

[mediawiki/skins/Vector@master] Add A/B test event logging data hook for sticky header.

Reason:

Abandoning this changeset in favor of consolidating in https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/734778/ since both patchsets need to share config for testing

734778 addresses changes needed by both T292586 + T292587

https://gerrit.wikimedia.org/r/732848

cjming assigned this task to nray.

Per discussion with @Jdlrobson + @nray, we've agreed upon the following convention for bucketing in general and for this specific context:

General understanding of bucketing for future A/B tests
  1. Unsampled (same experience as Control)
    • no treatment
    • without instrumentation
  2. Control (same experience as Unsampled)
    • no treatment
    • with instrumentation
  3. A (or preferably a more semantic name) is a test group with one variant of a feature/treatment
    • with treatment variant 1
    • with instrumentation
  4. B (or preferably a more semantic name) is a test group with another variant of a feature/treatment
    • with treatment variant 2
    • with instrumentation
In this specific context for the current, proposed sticky header A/B test
  1. Unsampled (in this case set to 0 because we want to sample all logged in users)
    • no instrumentation
  2. Control (same experience as group A)
    • logged-in users without sticky header
    • with instrumentation
  3. A (same experience as Control)
    • has preferably a more semantic name i.e. "stickyHeaderDisabled"
    • logged-in users without sticky header
    • with instrumentation
  4. B (different experience from Control + A)
    • has preferably a more semantic name i.e. "stickyHeaderEnabled"
    • logged-in users with sticky header
    • with instrumentation

The reason in this case to have Control + A be identical is to build confidence that bucket distribution is working correctly.

The proposed distribution values for sticky header buckets config would correlate to:

Unsampled: 0, Control: 0.333, A (or stickyHeaderDisabled): 0.333, B (or stickyHeaderEnabled): 0.333

or

Unsampled: 0.1, Control: 0.3, A (or stickyHeaderDisabled): 0.3, B (or stickyHeaderEnabled): 0.3

@jwang based on above, is it ok with you if we update the schema property for group to change from an enum to a string so we can name the buckets more intuitively/semantically per A/B test? Then when it comes time to analyze data, hopefully it's not problematic for your queries to search by whatever the group (bucket) names are as long as you know what they are beforehand?

So in this case, we would name group A something like 'stickyHeaderDisabled', group B as 'stickyHeaderEnabled', etc.

Change 735697 had a related patch set uploaded (by Krinkle; author: Jdlrobson):

[mediawiki/extensions/WikimediaEvents@master] webABTestEnrollment: Move sampling to inside getBucket and caller

https://gerrit.wikimedia.org/r/735697

Change 735697 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] webABTestEnrollment: Move sampling to inside getBucket and caller

https://gerrit.wikimedia.org/r/735697

@cjming Yes, it would work for analysis. Thank you for checking with me.

is it ok with you if we update the schema property for group to change from an enum to a string so we can name the buckets more intuitively/semantically per A/B test? Then when it comes time to analyze data, hopefully it's not problematic for your queries to search by whatever the group (bucket) names are as long as you know what they are beforehand?

Change 735993 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[schemas/event/secondary@master] Update web_ab_test_enrollment group property

https://gerrit.wikimedia.org/r/735993

Change 735993 merged by jenkins-bot:

[schemas/event/secondary@master] Update web_ab_test_enrollment group property

https://gerrit.wikimedia.org/r/735993

cjming removed cjming as the assignee of this task.Nov 3 2021, 2:32 AM
cjming assigned this task to nray.

Test Result - Beta

Status: ✅ PASS
Environment: local
OS: macOS Big Sur
Browser: Chrome
Device: MBP
Emulated Device: NA

Test Artifact(s):

QA Steps

Create some new test users to make sure you have one in each group: control, stickyHeaderDisabled, stickyHeaderEnabled (eventlogging will indicate which bucket the user is in by the group key).
✅ AC1: For all groups, an init event will be logged on page load (no other time) like the following:

controlstickyHeaderDisabledStickyHeaderEnabled
Screen Shot 2021-11-17 at 9.24.40 AM.png (360×1 px, 114 KB)
Screen Shot 2021-11-17 at 9.14.59 AM.png (362×1 px, 116 KB)
Screen Shot 2021-11-17 at 9.10.48 AM.png (359×1 px, 115 KB)

@jwang , please close this task once you have verified the events in production. I believe that is task T294639.

Hi @Edtadros , is the schema named as web_ab_test_enrollment? If so, I do not see it is available in event database.

Hi @jwang

We should be able to test this in production with a faked experiment called "jennifer.test" [1].

I'll look into this tomorrow.

[1]

mw.hook( 'mediawiki.web_AB_test_enrollment' ).fire( { experimentName: 'jennifer.test', group: 'A' } )

Change 742817 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/mediawiki-config@master] Enable A/B test enrollment instrumentation.

https://gerrit.wikimedia.org/r/742817

Change 742817 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable A/B test enrollment instrumentation.

https://gerrit.wikimedia.org/r/742817

Mentioned in SAL (#wikimedia-operations) [2021-12-01T00:10:03Z] <catrope@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742817|Enable A/B test enrollment instrumentation. (T292587)]] (duration: 00m 56s)

Thanks to Clare and Nick we've managed to send a few events to the mediawiki.web_AB_test_enrollment schema. Jennifer, when you have time, could you check that you are seeing them in the database?

Change 745586 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[mediawiki/skins/Vector@master] Update A/B test enrollment name

https://gerrit.wikimedia.org/r/745586

Change 745607 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[mediawiki/skins/Vector@wmf/1.38.0-wmf.12] Update A/B test enrollment name

https://gerrit.wikimedia.org/r/745607

Change 745586 merged by jenkins-bot:

[mediawiki/skins/Vector@master] Update A/B test enrollment name

https://gerrit.wikimedia.org/r/745586

Change 745607 merged by jenkins-bot:

[mediawiki/skins/Vector@wmf/1.38.0-wmf.12] Update A/B test enrollment name

https://gerrit.wikimedia.org/r/745607

Mentioned in SAL (#wikimedia-operations) [2021-12-10T00:33:27Z] <cjming@deploy1002> Synchronized php-1.38.0-wmf.12/skins/Vector: Backport: [[gerrit:745607|Update A/B test enrollment name (T292587)]] (duration: 00m 56s)

Queried at 2pm on Dec 13,2021. No events were found in schema mediawiki_web_ab_test_enrollment

query used:

select *
FROM event.mediawiki_web_ab_test_enrollment
WHERE year=2021

The events are available in schema mediawiki_web_ab_test_enrollment now. But we only see control and unsampled in group field. We did not see treatment group yet.

Query:

select distinct `group`
FROM event.mediawiki_web_ab_test_enrollment
WHERE year=2021

Return:

group
control
unsampled

For production QA, given this is now on test.wikipedia we can QA the data coming in using hue.wikimedia.org

For test wikipedia, I'm seeing the following results [1]

control	170
2	stickyHeaderDisabled	11
3	stickyHeaderEnabled	90
4	unsampled	15

Note the unsampled events are due to a bug that should have since been fixed (but will log while cached JS is served).

When I query against distinct tokens [2] I see:

control	11
2	stickyHeaderDisabled	3
3	stickyHeaderEnabled	8
4	unsampled	5

The stickyHeaderDisabled, stickyHeaderEnabled and control buckets should be equal in size, however the sample size is too small to verify that here, but it's a good sign that we have events for each bucket so I'm reasonably confident at this point.

[1] select `group`, count(*) from mediawiki_web_ab_test_enrollment WHERE month = 12 AND YEAR = 2021 AND wiki = 'testwiki' group by `group`
[2] select `group`, count(distinct web_session_id) from mediawiki_web_ab_test_enrollment WHERE month = 12 AND YEAR = 2021 AND wiki = 'testwiki' group by `group`
[3] select day from mediawiki_web_ab_test_enrollment WHERE month = 12 AND YEAR = 2021 AND wiki = 'testwiki' AND `group` = 'unsampled'

@Jdlrobson, My original understanding is that each token (session_id) will only be recorded once in this AB test schema. May I know why the token is repeatedly recorded in the schema? If one toke (session_id) is only sent into one experiment group, then the other rows of data are redundant.

@Jdlrobson, My original understanding is that each token (session_id) will only be recorded once in this AB test schema. May I know why the token is repeatedly recorded in the schema? If one toke (session_id) is only sent into one experiment group, then the other rows of data are redundant.

I'm not sure I follow the question. The AB test schema runs on every page view for users that are inside the A/B test while the A/B test is running. Provided everything goes to plan then for every user token logged an identical bucket will be shown. This may be redundant, but having events on each page view, will give us increased confidence that the bucketing is working correctly and allow us to link them confidently with other schemas e.g. web_ui_scroll due to matching timestamps and user tokens.

We have no way to reliably tell whether we already sent an event to mediawiki_web_ab_test_enrollment for the current user, so it wouldn't be practical to send the event just once without risky losing information.

When the A/B test is turned off or not working properly, no event will be logged so it is important when we look at any other schema e.g. web_ui_scroll that we check there is a corresponding event with the same user token in mediawiki_web_ab_test_enrollment.

Note, if during the A/B test we change sampling rate (due to request from analytics to lower traffic), the bucket for a given user token may change.

@Jdlrobson. Understand. Thanks for the explanation.

From standup:

10% unsampled
30% will see the sticky header
30% will be in the control group and not see the sticky header
30% will be in the "stickyHeaderDisabled" and will not see the sticky header