Page MenuHomePhabricator

A/B Test (Egg): Specifications on the gathering of data for the descriptive text on sister project links test
Closed, ResolvedPublic1 Estimated Story Points

Description

We'd like to run an A/B test that will have a control group and another group of wikipedia.org portal visitors that see new descriptive text for the sister (other) projects and want to be sure that the upcoming test is valid/complete from an Analysis standpoint.

The details on the actual wording for the descriptive text is in this story's epic: https://phabricator.wikimedia.org/T131238

Bucket testing logic generally is as follows:

  • 1 in 200 people are included in EventLogging
  • Of those 1 in 200 people, we first check if they have en in their accept-language and if they do then we do a 1 in 5 check to enroll them in the test (this is because the project descriptions are written in English)
  • Of those eligible & randomly selected people, 50% go in a test group, with the cohort "descriptive-text-b", and 50% go in a control group, with the cohort name "descriptive-text-a"
  • The other chunk of the 200 people gets a NULL (the string null, or the MySQL null, we can detect either).

The goal is to have an increased clickthrough rate on the sister project links, based on the addition of descriptive text for those links. Or, at least no decrease in the existing clickthrough rate, as shown on the portal dashboard: http://discovery.wmflabs.org/portal/#action_breakdown

Note that because of how little traffic that section gets, we've upped the sampling to 1 in 5 and we'll probably have to keep the test running for more than a week. Mikhail will need to check how wide the credible intervals are and if they're too wide (too much uncertainty) then we'll have to keep the test running until we have more certainty in our estimates.

Event Timeline

Hi @mpopov - can you take a look at this test and let us know if it looks good from an analysis standpoint?

Thanks!

mpopov raised the priority of this task from Medium to Needs Triage.
mpopov updated the task description. (Show Details)
mpopov set the point value for this task to 1.
Deskana triaged this task as High priority.Apr 19 2016, 8:16 PM
Deskana added a subscriber: Deskana.

Needs urgent attention. Mostly just validation of the proposed framework, so shouldn't be too hard.

Sounds good! I adjusted the description a little bit but yeah, the sampling sounds solid and the analysis will be pretty straightforward (unlike the language detection test).

@debt FYI! Resolving this since Mikhail has signed off on it.

mpopov added a subscriber: Jdrewniak.

Updated the description after a discussion with @debt and @Jdrewniak about who we're really targeting with the test

debt renamed this task from Specifications on the gathering of data for the descriptive text on sister project links test to A/B Test (Egg): Specifications on the gathering of data for the descriptive text on sister project links test.Apr 26 2016, 4:51 PM