We'd like to run an A/B test that will have a control group and another group of wikipedia.org portal visitors that see new descriptive text for the sister (other) projects and want to be sure that the upcoming test is valid/complete from an Analysis standpoint.
The details on the actual wording for the descriptive text is in this story's epic: https://phabricator.wikimedia.org/T131238
Bucket testing logic generally is as follows:
- 1 in 200 people are included in EventLogging
- Of those 1 in 200 people, we first check if they have en in their accept-language and if they do then we do a 1 in 5 check to enroll them in the test (this is because the project descriptions are written in English)
- Of those eligible & randomly selected people, 50% go in a test group, with the cohort "descriptive-text-b", and 50% go in a control group, with the cohort name "descriptive-text-a"
- The other chunk of the 200 people gets a NULL (the string null, or the MySQL null, we can detect either).
The goal is to have an increased clickthrough rate on the sister project links, based on the addition of descriptive text for those links. Or, at least no decrease in the existing clickthrough rate, as shown on the portal dashboard: http://discovery.wmflabs.org/portal/#action_breakdown
Note that because of how little traffic that section gets, we've upped the sampling to 1 in 5 and we'll probably have to keep the test running for more than a week. Mikhail will need to check how wide the credible intervals are and if they're too wide (too much uncertainty) then we'll have to keep the test running until we have more certainty in our estimates.