Page MenuHomePhabricator

Instrument mobile visual editor for a section editing experiment
Closed, ResolvedPublic

Description

We would like to run a controlled experiment to see how section editing affects editing behavior. To that end, we would like to instrument as follows:

  • Split registered users 50%-50% between section editing and the current experience. Unregistered editors can get either experience, but I would recommend sticking with the current experience until we are done with the test. The bucketing should be deterministic based on the user ID (e.g user ID mod 2) so that I can later identify data corresponding to each group based on the user ID.
  • Avoid bouncing users between multiple experiences (as @Esanders reminded me). Since section editing is already rolled out to all users on Hebrew, Cantonese, and Bengali Wikipedias, the bucketing should be kept off for these wikis. This also helps us get good qualitive feedback from these wikis without the confusion of half the users not getting the feature. On the other hand, the bucketing should start the moment the feature is deployed on all other wikis. The first additional deployment is scheduled for Monday, March 25. @ppelberg and I agree that we should delay this if necessary to allow time for instrumentation.

There are some things I have previously suggested, but I don't think are useful right now:

  • Bucket based on a hash of the user ID and the experiment name. This feels like a best practice so that we aren't using the exact same bucketing logic as any concurrent experiments, but our current analysis database version doesn't support any widely-used hash functions (T203498). I'm pretty sure there aren't any concurrent experiments going on, so it doesn't seem worth the effort.
  • Create a bucket field in the EditAttemptStep schema and send a group name along with any events. The user ID is permanently kept in our dataset, so I can just reapply the bucketing logic when fetching data rather than doing the extra instrumentation work.

Event Timeline

Given the time constraints I'll skip the nice-to-haves, and we can just bucket users by odd/even userIDs.

Change 498077 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/VisualEditor@master] Introduce a 'mobile-ab' config option for section editing

https://gerrit.wikimedia.org/r/498077

The above patch introduces a new 'mobile-ab' config setting that turns on mobile section editing for odd-numbered users. We can use that setting on the next set of wikis we deploy to.

Change 498084 had a related patch set uploaded (by Esanders; owner: Esanders):
[operations/mediawiki-config@master] VE section editing: Enable mobile AB test on remaining target wikis

https://gerrit.wikimedia.org/r/498084

The above patch introduces a new 'mobile-ab' config setting that turns on mobile section editing for odd-numbered users. We can use that setting on the next set of wikis we deploy to.

Thank you: it looks like exactly what we need!

Given the time constraints I'll skip the nice-to-haves, and we can just bucket users by odd/even userIDs.

Agreed—that list was just to explain why I had decided against some ideas I had mentioned previously (mostly to @ppelberg).

For future reference it should be noted that userId's are not persistent across wikis for the same user, so if we have a more complex bucket function we should use username to ensure users end up in the same bucket across wikis.

Section Editing Config Date Change

The date on which the Section Editing deployment config change will be made has been revised to 28-March from 25-March.

On 28-March, Section Editing will go live on the wikis listed in the description of T218939.

  1. Section Editing should not be deployed to the Wikis listed above until the vanishing-toolbars-while-scrolling (T218414) patch is live on the wikis listed above

Assuming nothing catastrophic happens, this should be next Thursday, 28th March.

*The update above as intended simply as that. It makes no indication that further action is required on this task, T218851, as a result of this date change.

For future reference it should be noted that userId's are not persistent across wikis for the same user, so if we have a more complex bucket function we should use username to ensure users end up in the same bucket across wikis.

Good point! I documented this at a page I just created: wikitech:A/B testing. Feel free to dump other thoughts and links there too.

Change 498077 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Introduce a 'mobile-ab' config option for section editing

https://gerrit.wikimedia.org/r/498077

Change 498084 merged by jenkins-bot:
[operations/mediawiki-config@master] VE section editing: Enable mobile AB test on remaining target wikis

https://gerrit.wikimedia.org/r/498084

Mentioned in SAL (#wikimedia-operations) [2019-03-28T20:48:26Z] <jforrester@deploy1001> Synchronized wmf-config/InitialiseSettings.php: VisualEditor: Enable mobile section editing A/B test on 10 Wikipedias T218851 T218939 (duration: 00m 50s)

Extending A/B test to all remaining wikis

In the context of Section Editing becoming live tomorrow (2-April) for contributors to all remaining wikis, this question emerged: "Will contributors to all remaining wikis be a part of our ongoing Section Editing A/B test (T218851)?"

Decided

Yes. Per a conversation today with @Neil_P._Quinn_WMF, contributors to all remaining wikis should be included in our ongoing Section Editing A/B test (T218851). This is relevant to tomorrow's config change deployment: T219564.

Open

  • How long will the A/B test last? This depends on how long it will take to reach a statistically significant sample size. A question @Neil_P._Quinn_WMF is investigating.

cc @Esanders @Whatamidoing-WMF


Edit: removed "200 wikis" from comment title – thank you, @Whatamidoing-WMF

Doesn't "all remaining wikis" mean something closer to 800 (not 200)?

Good point, I wasn't counting sister projects.

A/B test duration

Open

  • How long will the A/B test last? This depends on how long it will take to reach a statistically significant sample size. A question @Neil_P._Quinn_WMF is investigating.

The Section Editing A/B test will run through 14-May-2019. This is based on the power analysis @Neil_P._Quinn_WMF did. The results of which are documented in the Section Editing Measurement Plan.


The duration (28-March - 14-May) of the Section Editing A/B test is now represented in the Editing Team Google Calendar.


cc @Esanders @WhatamIdoing

The Section Editing A/B test will run through 14-May-2019.

Yup! One note: this doesn't necessarily mean we should end the 50%-50% rollout at that point, because that would require deciding whether to turn it on or off for all users. Instead, it would be better to start the analysis at that point, and leave the split in place until we can decided our long-term plan based on the results.

In T218851#5095956, @Neil_P._Quinn_WMF wrote:

The Section Editing A/B test will run through 14-May-2019.

Yup! One note: this doesn't necessarily mean we should end the 50%-50% rollout at that point, because that would require deciding whether to turn it on or off for all users. Instead, it would be better to start the analysis at that point, and leave the split in place until we can decided our long-term plan based on the results.

Good call – thank you for clarifying, @Neil_P._Quinn_WMF.

T211239 is now updated to reflect this thinking.