VE mobile default: design A/B test
Closed, ResolvedPublicMay 29 2019
Actions

Assigned To

Authored By

	ppelberg
	Apr 17 2019, 6:45 AM

Details

Due Date: May 29 2019, 7:00 AM

Subject	Repo	Branch	Lines +/-
VE-as-default abtest should only apply to users with <100 edits	mediawiki/extensions/MobileFrontend	master	+5 -2
Bump EditAttemptStep schema	mediawiki/extensions/WikimediaEvents	master	+1 -1
Allow the default mobile editor to be configured	mediawiki/extensions/MobileFrontend	master	+175 -68

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		None	T255327 [Epic] Evaluate which editing interface should be shown by default
Open		None	T227338 Test visual editor as the default mobile editor on select wikis
Duplicate		ppelberg	T221187 VE mobile default: create measurement specifications and experiment plan
Resolved	May 29 2019	nshahquinn-wmf	T221195 VE mobile default: design A/B test
Resolved		JTannerWMF	T225209 Consult legal about A/B test instrumentation

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

ppelberg removed a subtask: T221196: Set mobile VE as default for target wikis in A/B test.Apr 17 2019, 6:54 AM

ppelberg renamed this task from Design "VE as mobile default" A/B test to VE mobile default: design A/B test.Apr 17 2019, 7:11 AM

ppelberg updated the task description. (Show Details)

ppelberg updated the task description. (Show Details)May 6 2019, 11:22 PM

ppelberg updated the task description. (Show Details)

ppelberg mentioned this in T222803: Evaluate target wiki candidates for Q4 work.May 8 2019, 3:42 PM

ppelberg updated the task description. (Show Details)May 14 2019, 9:11 PM

ppelberg added a subtask: T223339: Re-run metrics from VE on mobile report .May 14 2019, 11:55 PM

ppelberg removed a subtask: T223339: Re-run metrics from VE on mobile report .May 15 2019, 12:48 AM

ppelberg updated the task description. (Show Details)May 15 2019, 12:52 AM

Update | 14-May

@Neil_P._Quinn_WMF and I discussed:

The sample size for this A/B test is going to be further limited by the fact that making VE the default will not override existing mobile contributors' sticky preferences. Said another way: the only contributors who will be able to be included in this test are those contributors who have not yet opened the editor on their mobile device
- Considering the small sample size and the non-invasive nature of making VE the default [1] we ought to considering including more wikis in the test
Test group bucketing: @Neil_P._Quinn_WMF to explore bucketing contributors using a cookie

Next steps

@Neil_P._Quinn_WMF to figure out whether bucketing contributors into test group by browser cookies will work for this experiment
@ppelberg to check with @Whatamidoing-WMF whether she is comfortable increasing the number of wikis we include in this A/B test

Non-invasive nature: existing mobile contributors' mobile editing workflows will not be affected (sticky preferences) and we are not introducing any new functionality

ppelberg updated the task description. (Show Details)May 15 2019, 1:16 AM

ppelberg updated the task description. (Show Details)May 15 2019, 1:23 AM

ppelberg added a project: Product-Analytics.May 15 2019, 6:16 PM

ppelberg added a subscriber: • iamjessklein.May 15 2019, 6:23 PM

ppelberg added a subscriber: Esanders.May 15 2019, 6:25 PM

One of the open questions about the A/B test is scale: What can we do to make sure we have enough edit sessions to draw meaningful conclusions from the A/B test with a reasonable level of precision and within a reasonable amount of time?

Prior to today's Planning Meeting, we'd been thinking to only include new contributors in the A/B test in order to minimize the disruptiveness of the change. Considering how this decision would further limit the number of sessions we can analyze, we started to think about including more wikis in the A/B test.

However, today @DLynch suggested that considering a sticky preference is only set after an editor explicitly chooses one editor (wikitext or VE) (thanks @matmarex for confirming), perhaps it makes sense to also include all contributors in the A/B test [to the target wikis we select] who have do not have a sticky preference set (read: they've followed the default presented to them and have not made a deliberate choice about which editing interface they prefer).

Resulting questions

Confirm: we can distinguish between contributors who do and do not have a sticky editing interface preference set added to task description
Confirm: what impact – in terms of scale – does including all contributors who do not have a sticky preference set on the number of edit sessions added to task description

ppelberg updated the task description. (Show Details)May 15 2019, 7:07 PM

Update | 16-May

Sticky preferences: @Whatamidoing-WMF and I talked today and decided there's no issue with including all contributors who do not have a sticky preference set into the A/B test. This means that contributors who have used wikitext on mobile before, but have not switched to mobile VE and back again to wikitext, will potentially be included in the test group. reflected in task's description

Expanding list of wikis to be included in A/B test...

In T221195#5183253, @ppelberg wrote:

@ppelberg to check with @Whatamidoing-WMF whether she is comfortable increasing the number of wikis we include in this A/B test

@Whatamidoing-WMF is comfortable including more wikis in the A/B test. See: T222803

In T221195#5187977, @ppelberg wrote:

Update | 16-May

Sticky preferences: @Whatamidoing-WMF and I talked today and decided there's no issue with including all contributors who do not have a sticky preference set into the A/B test. This means that contributors who have used wikitext on mobile before, but have not switched to mobile VE and back again to wikitext, will potentially be included in the test group. reflected in task's description

I don't think we distinguish between users who have used the editor and users who have switched. I think we store the preference as soon as any editor is opened.

In T221195#5187989, @Esanders wrote:

In T221195#5187977, @ppelberg wrote:

Update | 16-May

Sticky preferences: @Whatamidoing-WMF and I talked today and decided there's no issue with including all contributors who do not have a sticky preference set into the A/B test. This means that contributors who have used wikitext on mobile before, but have not switched to mobile VE and back again to wikitext, will potentially be included in the test group. reflected in task's description

I don't think we distinguish between users who have used the editor and users who have switched. I think we store the preference as soon as any editor is opened.

@Esanders, interesting, ok. That would mean there are fewer edit sessions to analyze than we thought yesterday. @matmarex, I remember you checking in on this [1] quickly during yesterday's planning meeting: what did you find?

cc @DLynch

Checking in on this: When are editing interface preferences stored on mobile?

ppelberg set Due Date to May 22 2019, 7:00 AM.May 17 2019, 2:14 AM

Restricted Application changed the subtype of this task from "Task" to "Deadline". · View Herald TranscriptMay 17 2019, 2:14 AM

ppelberg updated the task description. (Show Details)May 17 2019, 8:38 PM

JTannerWMF edited projects, added VisualEditor (Current work); removed VisualEditor.May 21 2019, 3:27 PM

JTannerWMF moved this task from Incoming to In progress on the VisualEditor (Current work) board.

During the planning meeting @matmarex said that we only store the preference after a switch has happened, rather than when the editor is opened.

To confirm and specify... we only set preferredEditor during calls to switch editor (to source, to visual).

@DLynch, thank you for checking and clarifying.

Two questions now: one for you and one for Neil...

Open questions

In T221195#5201746, @DLynch wrote:

To confirm and specify... we only set preferredEditor during calls to switch editor (to source, to visual).

@Neil_P._Quinn_WMF, does @DLynch's answer☝️ resolve: "Under what conditions is a preference stored?"

@DLynch: just to be doubly sure is the next sentence accurate? We only set preferredEditor (a contributor's preference for wikitext or VE) when a contributor explicitly switches from one editor to the other.

ppelberg updated the task description. (Show Details)May 21 2019, 6:07 PM

nshahquinn-wmf moved this task from Triage to Next Up on the Product-Analytics board.May 21 2019, 9:33 PM

nshahquinn-wmf mentioned this in T221192: VE mobile default: make experiment plan.May 21 2019, 10:21 PM

nshahquinn-wmf merged a task: T221192: VE mobile default: make experiment plan.May 21 2019, 11:15 PM

nshahquinn-wmf subscribed.

ppelberg updated the task description. (Show Details)May 22 2019, 12:33 AM

Decided

With the "Discussed/considered" in mind, @Neil_P._Quinn_WMF and I arrived at the following (task description updated)

We will include all contributors in the A/B test who:
- Do not have a sticky preference set
- Who have made <100 overall edits

Discussed/considered

Our goal with the A/B test is to determine whether VE is a "better" default for contributors who do not have experience editing on mobile
Up to this point, we have been thinking we would include all contributors in the A/B test who do not have preferredEditor set
However, considering [preferredEditor](https://phabricator.wikimedia.org/T221195#5201746) is only set when a contributor explicitly chooses an editor, were we to continue down the "include everyone w/o a preference set" path, we would potentially including many tenured editors in the test and in the process, disrupting their existing workflows
We then thought about whether we could limit the test to contributors who do not have sticky preference set and meet a certain experience threshold (i.e. <100 mobile edits = include in A/B test; >100 mobile edits = exclude from A/B test)
However, we talked about how it's difficult to calculate anything other than overall edit counts on-the-fly

...and this is how we arrived at the "Decided"

ppelberg updated the task description. (Show Details)May 22 2019, 12:59 AM

Timing update

We are needing to move the timing of this task back to next week.

ppelberg changed Due Date from May 22 2019, 7:00 AM to May 29 2019, 7:00 AM.May 22 2019, 1:07 AM

ppelberg updated the task description. (Show Details)May 22 2019, 10:21 PM

22-May update

The blocking question right now is: How we can go about including anonymous contributors in the A/B test? (task description has been updated – see "Blocking questions")

Next steps

@Neil_P._Quinn_WMF to sync with engineering about coming up with a solve on the above

23-May | Update

@Neil_P._Quinn_WMF, we talked about the below during today's stand up. Engineering is ready to help. They just need a clear ask.

cc @DLynch @Esanders

In T221195#5206773, @ppelberg wrote:

22-May update

The blocking question right now is: How we can go about including anonymous contributors in the A/B test? (task description has been updated – see "Blocking questions")

Next steps

@Neil_P._Quinn_WMF to sync with engineering about coming up with a solve on the above

31-May | Update

@Neil_P._Quinn_WMF and I talked about the remaining "Blocking questions"

How we can go about including anonymous contributors in the A/B test?
The first time we see an anonymous user, we should use client-side logic to randomize them into either bucket and then set a cookie so that every EditAttemptStep event sent from them includes the appropriate value in the “cohort” field. If we’re doing this for anonymous editors, we should do it the same way for registered users as well.

How would we store a persistent identifier for bucketed anons?
We have to run test and conduct analysis within 90 days of test start date. We could use IP address + user agent as the ID, or create and store our own (more robust but harder and probably not worth the effort).

The task description has been updated to include the above.

Next steps

@DLynch, can you think of any other questions that need to be answered before we can begin instrumenting the A/B test?

ppelberg updated the task description. (Show Details)Jun 3 2019, 7:23 PM

In T221195#5226894, @ppelberg wrote:

31-May | Update

Next steps

@DLynch, can you think of any other questions that need to be answered before we can begin instrumenting the A/B test?

3-June Update

Just chatted with @DLynch. Follow up questions standing in the way of implementation (added to task description):

Is legal okay with us creating a persistent identifier for anonymous contributors (doing this this would anonymous contributors less anonymous?
Whether we decide to store a persistent identifier based on IP + user agent or create and store our own requires a modification to EditAttemptStep: how should changes to EditAttemptStep be implemented?

ppelberg updated the task description. (Show Details)Jun 3 2019, 10:39 PM

Outcome of call: generating a persistent-ish anon userid (cookie, expiring 90 days) and storing it as a negative integer in the user_id schema field would be acceptable for analytics.

ppelberg updated the task description. (Show Details)Jun 6 2019, 1:52 AM

ppelberg updated the task description. (Show Details)Jun 10 2019, 10:21 PM

ppelberg added a subtask: T225209: Consult legal about A/B test instrumentation.Jun 12 2019, 6:25 PM

In T221195#5237611, @DLynch wrote:

Outcome of call: generating a persistent-ish anon userid (cookie, expiring 90 days) and storing it as a negative integer in the user_id schema field would be acceptable for analytics.

One problem: if we store it in user_id, we will have to start purging that field after 90 days, whereas we currently do not. We should instead add an anonymous_user_token field or similar to EditAttemptStep.

One problem: if we store it in user_id, we will have to start purging that field after 90 days, whereas we currently do not. We should instead add an anonymous_user_token field or similar to EditAttemptStep.

How's the purge configured? I don't know anything about the mechanism for it -- is it raw "empty this field after X time", or are there more conditions possible?

In T221195#5257108, @DLynch wrote:

One problem: if we store it in user_id, we will have to start purging that field after 90 days, whereas we currently do not. We should instead add an anonymous_user_token field or similar to EditAttemptStep.

How's the purge configured? I don't know anything about the mechanism for it -- is it raw "empty this field after X time", or are there more conditions possible?

It's just a straight "empty this field after 90 days"—the only other condition possible is "salt-and-hash" but I wouldn't want to do that here since it would mess with the registered user IDs. I just added anonymous_user_token to the schema, and I suggest we use that.

I'm not opposed to sticking the token in an existing field, but it would have to be a field that doesn't appear on the whitelist, and I don't see any good candidates.

Change 517369 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/WikimediaEvents@master] Bump EditAttemptStep schema

https://gerrit.wikimedia.org/r/517369

gerritbot added a project: Patch-For-Review.Jun 17 2019, 6:48 AM

Change 517380 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/MobileFrontend@master] Allow the default mobile editor to be configured

https://gerrit.wikimedia.org/r/517380

Change 517369 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Bump EditAttemptStep schema

https://gerrit.wikimedia.org/r/517369

ReleaseTaggerBot added a project: MW-1.34-notes (1.34.0-wmf.10; 2019-06-18).Jun 17 2019, 6:01 PM

Change 517380 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] Allow the default mobile editor to be configured

https://gerrit.wikimedia.org/r/517380

Testing this patch beyond the superficial "it doesn't break the existing behavior" requires that you set the $wgMFDefaultEditor config variable, to one of source, visual, preference, or abtest. The first two are obvious, preference means it should obey your user's desktop preference, and abtest should do a 50/50 split which is persistent for a given user (including anonymous users, until cookies are cleared). Super-deep validation includes making sure that abtest causes eventlogging to include the anonymous_user_token and bucket fields with appropriate values.

ppelberg mentioned this in T226262: Evaluate onboarding in mobile VE.Jun 21 2019, 3:01 PM

Change 518757 had a related patch set uploaded (by DLynch; owner: DLynch):
[VisualEditor/VisualEditor@master] Enhanced instrumentation for context items and inspectors

https://gerrit.wikimedia.org/r/518757

Change 518791 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/MobileFrontend@master] VE-as-default abtest should only apply to users with <100 edits

https://gerrit.wikimedia.org/r/518791

Change 518791 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] VE-as-default abtest should only apply to users with <100 edits

https://gerrit.wikimedia.org/r/518791

ReleaseTaggerBot edited projects, added MW-1.34-notes (1.34.0-wmf.11; 2019-06-26); removed MW-1.34-notes (1.34.0-wmf.10; 2019-06-18).Jun 25 2019, 2:01 AM

In T221195#5279409, @gerritbot wrote:

Change 518757 had a related patch set uploaded (by DLynch; owner: DLynch):
[VisualEditor/VisualEditor@master] Enhanced instrumentation for context items and inspectors

https://gerrit.wikimedia.org/r/518757

This patch is actually for T221252: Enhance instrumentation for context items and inspectors.

ppelberg mentioned this in T226687: Determine decision points for VE as default.Jun 27 2019, 4:59 AM

We received Legal approval. Leighanna's response: It sounds like you've come up with a good, privacy-protective experiment! The plan for implementing and retaining the data from anonymous_user_token sounds good to me. You're good to go.

ppelberg mentioned this in T225209: Consult legal about A/B test instrumentation.Jun 27 2019, 11:39 PM

ppelberg closed subtask T225209: Consult legal about A/B test instrumentation as Resolved.

ppelberg mentioned this in T221196: Set mobile VE as default for target wikis in A/B test.Jun 28 2019, 3:21 PM

ppelberg mentioned this in T227002: Make sure VE is mobile default on select wikis.Jul 1 2019, 4:04 PM

nshahquinn-wmf triaged this task as High priority.Jul 12 2019, 5:02 PM

This has been done; we're creating a new task to document the measurement plan

kzimmerman merged a task: T221187: VE mobile default: create measurement specifications and experiment plan.Aug 20 2019, 6:35 PM

kzimmerman mentioned this in T230827: Write up experiment plan.

ppelberg mentioned this in T235104: QA for EventLogging patch.Oct 9 2019, 3:36 PM

ppelberg mentioned this in T236337: Create new way to bucket contributors .Oct 24 2019, 12:15 AM

ppelberg mentioned this in T221198: VE mobile default: analyze A/B test results.Jun 1 2020, 10:11 PM

nettrom_WMF mentioned this in T292209: Update editattemptstep schema documentation for the anonymous_user_token field.Sep 30 2021, 3:57 PM

MNeisler mentioned this in T291307: Implement New Discussion Tool A/B test bucketing.Dec 13 2021, 6:21 PM

VE mobile default: design A/B test
Closed, ResolvedPublicMay 29 2019
Actions

Description

Test principles

Research questions*

Blocking questions

Open questions

"Done"

Details

Related Objects
Search...

Event Timeline

Update | 14-May

Next steps

Resulting questions

Update | 16-May

Update | 16-May

Update | 16-May

Open questions

Decided

Discussed/considered

Timing update

22-May update

Next steps

23-May | Update

22-May update

Next steps

31-May | Update

Next steps

31-May | Update

Next steps

3-June Update

VE mobile default: design A/B testClosed, ResolvedPublicMay 29 2019Actions

Description

Test principles

Research questions*

Blocking questions

Open questions

"Done"

Details

Related ObjectsSearch...

Event Timeline

Update | 14-May

Next steps

Resulting questions

Update | 16-May

Update | 16-May

Update | 16-May

Open questions

Decided

Discussed/considered

Timing update

22-May update

Next steps

23-May | Update

22-May update

Next steps

31-May | Update

Next steps

31-May | Update

Next steps

3-June Update

VE mobile default: design A/B test
Closed, ResolvedPublicMay 29 2019
Actions

Related Objects
Search...