Page MenuHomePhabricator

Turn A/B test for page issues on for test projects at low rates to validate instrumentation
Closed, DeclinedPublic2 Story Points

Description

Background

We'd like to turn on the A/B test for page issues so that we can perform QA easily across different projects
Test will be run on the following projects:

  • English
  • Russian
  • Japanese
  • Catalan

Acceptance criteria

  • turn on A/B test for all projects at a negligibly low rate

TODO

Define the negligibly low rate

Event Timeline

ovasileva triaged this task as High priority.Sep 17 2018, 10:23 PM
ovasileva created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 17 2018, 10:23 PM

@Tbayer - how small should we go? Also - if we define a rate small enough, do we also want to include the wikis we want to analyze the data for here or should we exclude them until the remaining bugs with the instrumentation are fixed?

So to clarify, this task is not about launching the actual A/B test where we'll collect data to evaluate the effects of the new design (T200792 already covers that per the last the last AC there). Rather, it's about deploying the new design and the instrumentation to production, so that we can test both before (a non-negligible number of) users will see it. Correct?

@Tbayer - how small should we go?

Well, I would say the lowest that still allows testers to bucket themselves into the experimental group (i.e. the new design). That's a question for the developers. But considering that we sample by sessionToken, which now has 80 bits, 1/2^80 may be the theoretical limit. Alternatively, I asked on Slack earlier today whether it might actually possible to set it to 0% and still be able to active this for oneself somehow (i.e. even though there would be no possible value sessionToken that one could set to be in the experimental group)- @Jdlrobson, thoughts?

Also - if we define a rate small enough, do we also want to include the wikis we want to analyze the data for here or should we exclude them until the remaining bugs with the instrumentation are fixed?

I think that depends on the answer to the question above. There is no great harm in collecting some possibly faulty data until the instrumentation bugs are fully fixed (we can just exclude that from analysis). But IIRC we also wanted to avoid serving broken versions of the design to users, at least for feature bugs that are avoidable in the sense that we could have found in our planned QA on a smaller set of wikis.

ovasileva updated the task description. (Show Details)Sep 18 2018, 4:26 PM
Jdlrobson updated the task description. (Show Details)Sep 18 2018, 4:26 PM

So to clarify, this task is not about launching the actual A/B test where we'll collect data to evaluate the effects of the new design (T200792 already covers that per the last the last AC there). Rather, it's about deploying the new design and the instrumentation to production, so that we can test both before (a non-negligible number of) users will see it. Correct?

Correct

@Tbayer - how small should we go?

Well, I would say the lowest that still allows testers to bucket themselves into the experimental group (i.e. the new design). That's a question for the developers. But considering that we sample by sessionToken, which now has 80 bits, 1/2^80 may be the theoretical limit. Alternatively, I asked on Slack earlier today whether it might actually possible to set it to 0% and still be able to active this for oneself somehow (i.e. even though there would be no possible value sessionToken that one could set to be in the experimental group)- @Jdlrobson, thoughts?

Also - if we define a rate small enough, do we also want to include the wikis we want to analyze the data for here or should we exclude them until the remaining bugs with the instrumentation are fixed?

I think that depends on the answer to the question above. There is no great harm in collecting some possibly faulty data until the instrumentation bugs are fully fixed (we can just exclude that from analysis). But IIRC we also wanted to avoid serving broken versions of the design to users, at least for feature bugs that are avoidable in the sense that we could have found in our planned QA on a smaller set of wikis.

Let's try the smaller set then.

ovasileva updated the task description. (Show Details)Sep 18 2018, 4:29 PM
ovasileva set the point value for this task to 2.Sep 18 2018, 4:32 PM

As I mentioned in standup, a safer approach would be for us to allow manual optin to the new a/b test via a query string parameter e.g. debug=true or issues=A / issues=B if this is acceptable.

ovasileva renamed this task from Turn A/B test for page issues on for all projects to Turn A/B test for page issues on for test projects.Sep 20 2018, 8:11 PM
ovasileva renamed this task from Turn A/B test for page issues on for test projects to Turn A/B test for page issues on for test projects at low rates to validate instrumentation.
ovasileva updated the task description. (Show Details)

@Tbayer @ovasileva I'm a little confused by this task still - T200792 will need several incremental changes to be done (we shouldn't switch from 0 to 20% in one deploy), so I'm wondering if that task covers what's being asked here. Please take a look!