Page MenuHomePhabricator

VE mobile default: design A/B test
Closed, ResolvedPublicMay 29 2019

Description

Test principles

We will include all contributors in the A/B test who:

  • Do not have a sticky preference set
  • Have made <100 overall edits
  • Are anonymous or registered users

Bucketing criteria
The first time we see an anonymous user, we should use client-side logic to randomize them into either bucket and then set a cookie so that every EditAttemptStep event sent from them includes the appropriate value in the “bucket” field. If we’re doing this for anonymous editors, we should do it the same way for registered users as well.

Storing persistent identifiers for bucketed anons
We have to run test and conduct analysis within 90 days of test start date. We could use IP address + user agent as the ID, or create and store our own (more robust but harder and probably not worth the effort).

Research questions*

Total number of completed edits: Do contributors in one test group complete more edits than contributors in the other test group?
Time to save an edit: Do contributors in one test group complete their edits more quickly than contributors in the other test group? This is a metric we would need to look at alongside other measures, like the size of the edits being made.
Editor retention: Are contributors in one test group more likely to come back to edit again than contributors in the other test group?
Edit quality: Are contributors’ edits in one test group more likely to be reverted than contributors’ edits in another test group?
Disruption: Are contributors in one test group switching between editing interfaces more often than contributors in another test group? (i.e. people fleeing back to wikitext.)

*See T221187

Blocking questions

  • Is legal okay with us creating a persistent identifier for anonymous contributors (doing this this would anonymous contributors less anonymous?
  • Whether we decide to store a persistent identifier based on IP + user agent or create and store our own requires a modification to EditAttemptStep: how should changes to EditAttemptStep be implemented?

Open questions

  • Under what conditions are contributors' editing interface preferences set? See T221195#5201746
  • How will we bucket users if we want to include contributors who are not logged in?
  • What level of precision is appropriate for this A/B test?
  • What wikis are being included in this A/B test? See T222803
  • Confirm: we can distinguish between contributors who do and do not have a sticky editing interface preference set? Yes
  • Confirm: what impact – in terms of scale – does including all contributors who do not have a sticky preference set on the number of edit sessions
  • How long will the test last?
  • How are we going to monitor the test? What will trigger us to interrupt the test?
  • What are the range of actions we could take after the test concludes? What will determine which action(s) we take?
  • What could we do to exclude editors from our edit completion rate numbers who we assume do not have any intention of completing an actual edit?

"Done"

  • A phabricator task is created that specifies how engineering should implement/instrument the A/B test

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ppelberg renamed this task from Design "VE as mobile default" A/B test to VE mobile default: design A/B test.Apr 17 2019, 7:11 AM
ppelberg updated the task description. (Show Details)
ppelberg updated the task description. (Show Details)
Update | 14-May

@Neil_P._Quinn_WMF and I discussed:

  • The sample size for this A/B test is going to be further limited by the fact that making VE the default will not override existing mobile contributors' sticky preferences. Said another way: the only contributors who will be able to be included in this test are those contributors who have not yet opened the editor on their mobile device
    • Considering the small sample size and the non-invasive nature of making VE the default [1] we ought to considering including more wikis in the test
  • Test group bucketing: @Neil_P._Quinn_WMF to explore bucketing contributors using a cookie

Next steps

  • @Neil_P._Quinn_WMF to figure out whether bucketing contributors into test group by browser cookies will work for this experiment
  • @ppelberg to check with @Whatamidoing-WMF whether she is comfortable increasing the number of wikis we include in this A/B test

  1. Non-invasive nature: existing mobile contributors' mobile editing workflows will not be affected (sticky preferences) and we are not introducing any new functionality

One of the open questions about the A/B test is scale: What can we do to make sure we have enough edit sessions to draw meaningful conclusions from the A/B test with a reasonable level of precision and within a reasonable amount of time?

Prior to today's Planning Meeting, we'd been thinking to only include new contributors in the A/B test in order to minimize the disruptiveness of the change. Considering how this decision would further limit the number of sessions we can analyze, we started to think about including more wikis in the A/B test.

However, today @DLynch suggested that considering a sticky preference is only set after an editor explicitly chooses one editor (wikitext or VE) (thanks @matmarex for confirming), perhaps it makes sense to also include all contributors in the A/B test [to the target wikis we select] who have do not have a sticky preference set (read: they've followed the default presented to them and have not made a deliberate choice about which editing interface they prefer).

Resulting questions

  • Confirm: we can distinguish between contributors who do and do not have a sticky editing interface preference set added to task description
  • Confirm: what impact – in terms of scale – does including all contributors who do not have a sticky preference set on the number of edit sessions added to task description

Update | 16-May

  • Sticky preferences: @Whatamidoing-WMF and I talked today and decided there's no issue with including all contributors who do not have a sticky preference set into the A/B test. This means that contributors who have used wikitext on mobile before, but have not switched to mobile VE and back again to wikitext, will potentially be included in the test group. reflected in task's description
  • Expanding list of wikis to be included in A/B test...

Update | 16-May

  • Sticky preferences: @Whatamidoing-WMF and I talked today and decided there's no issue with including all contributors who do not have a sticky preference set into the A/B test. This means that contributors who have used wikitext on mobile before, but have not switched to mobile VE and back again to wikitext, will potentially be included in the test group. reflected in task's description

I don't think we distinguish between users who have used the editor and users who have switched. I think we store the preference as soon as any editor is opened.

Update | 16-May

  • Sticky preferences: @Whatamidoing-WMF and I talked today and decided there's no issue with including all contributors who do not have a sticky preference set into the A/B test. This means that contributors who have used wikitext on mobile before, but have not switched to mobile VE and back again to wikitext, will potentially be included in the test group. reflected in task's description

I don't think we distinguish between users who have used the editor and users who have switched. I think we store the preference as soon as any editor is opened.

@Esanders, interesting, ok. That would mean there are fewer edit sessions to analyze than we thought yesterday. @matmarex, I remember you checking in on this [1] quickly during yesterday's planning meeting: what did you find?

cc @DLynch


  1. Checking in on this: When are editing interface preferences stored on mobile?
ppelberg set Due Date to May 22 2019, 7:00 AM.May 17 2019, 2:14 AM
Restricted Application changed the subtype of this task from "Task" to "Deadline". · View Herald TranscriptMay 17 2019, 2:14 AM

During the planning meeting @matmarex said that we only store the preference after a switch has happened, rather than when the editor is opened.

To confirm and specify... we only set preferredEditor during calls to switch editor (to source, to visual).

@DLynch, thank you for checking and clarifying.

Two questions now: one for you and one for Neil...

Open questions

To confirm and specify... we only set preferredEditor during calls to switch editor (to source, to visual).

  • @Neil_P._Quinn_WMF, does @DLynch's answer☝️ resolve: "Under what conditions is a preference stored?"
  • @DLynch: just to be doubly sure is the next sentence accurate? We only set preferredEditor (a contributor's preference for wikitext or VE) when a contributor explicitly switches from one editor to the other.

Decided

With the "Discussed/considered" in mind, @Neil_P._Quinn_WMF and I arrived at the following (task description updated)

  • We will include all contributors in the A/B test who:
    • Do not have a sticky preference set
    • Who have made <100 overall edits

Discussed/considered

  • Our goal with the A/B test is to determine whether VE is a "better" default for contributors who do not have experience editing on mobile
  • Up to this point, we have been thinking we would include all contributors in the A/B test who do not have preferredEditor set
  • However, considering [preferredEditor](https://phabricator.wikimedia.org/T221195#5201746) is only set when a contributor explicitly chooses an editor, were we to continue down the "include everyone w/o a preference set" path, we would potentially including many tenured editors in the test and in the process, disrupting their existing workflows
  • We then thought about whether we could limit the test to contributors who do not have sticky preference set and meet a certain experience threshold (i.e. <100 mobile edits = include in A/B test; >100 mobile edits = exclude from A/B test)
  • However, we talked about how it's difficult to calculate anything other than overall edit counts on-the-fly

...and this is how we arrived at the "Decided"

Timing update

We are needing to move the timing of this task back to next week.

ppelberg changed Due Date from May 22 2019, 7:00 AM to May 29 2019, 7:00 AM.May 22 2019, 1:07 AM

22-May update

The blocking question right now is: How we can go about including anonymous contributors in the A/B test? (task description has been updated – see "Blocking questions")

Next steps

  • @Neil_P._Quinn_WMF to sync with engineering about coming up with a solve on the above

23-May | Update

@Neil_P._Quinn_WMF, we talked about the below during today's stand up. Engineering is ready to help. They just need a clear ask.

cc @DLynch @Esanders

22-May update

The blocking question right now is: How we can go about including anonymous contributors in the A/B test? (task description has been updated – see "Blocking questions")

Next steps

  • @Neil_P._Quinn_WMF to sync with engineering about coming up with a solve on the above

31-May | Update

@Neil_P._Quinn_WMF and I talked about the remaining "Blocking questions"

How we can go about including anonymous contributors in the A/B test?
The first time we see an anonymous user, we should use client-side logic to randomize them into either bucket and then set a cookie so that every EditAttemptStep event sent from them includes the appropriate value in the “cohort” field. If we’re doing this for anonymous editors, we should do it the same way for registered users as well.

How would we store a persistent identifier for bucketed anons?
We have to run test and conduct analysis within 90 days of test start date. We could use IP address + user agent as the ID, or create and store our own (more robust but harder and probably not worth the effort).

The task description has been updated to include the above.

Next steps

  • @DLynch, can you think of any other questions that need to be answered before we can begin instrumenting the A/B test?

31-May | Update

Next steps

  • @DLynch, can you think of any other questions that need to be answered before we can begin instrumenting the A/B test?

3-June Update

Just chatted with @DLynch. Follow up questions standing in the way of implementation (added to task description):

  • Is legal okay with us creating a persistent identifier for anonymous contributors (doing this this would anonymous contributors less anonymous?
  • Whether we decide to store a persistent identifier based on IP + user agent or create and store our own requires a modification to EditAttemptStep: how should changes to EditAttemptStep be implemented?

Outcome of call: generating a persistent-ish anon userid (cookie, expiring 90 days) and storing it as a negative integer in the user_id schema field would be acceptable for analytics.

Outcome of call: generating a persistent-ish anon userid (cookie, expiring 90 days) and storing it as a negative integer in the user_id schema field would be acceptable for analytics.

One problem: if we store it in user_id, we will have to start purging that field after 90 days, whereas we currently do not. We should instead add an anonymous_user_token field or similar to EditAttemptStep.

One problem: if we store it in user_id, we will have to start purging that field after 90 days, whereas we currently do not. We should instead add an anonymous_user_token field or similar to EditAttemptStep.

How's the purge configured? I don't know anything about the mechanism for it -- is it raw "empty this field after X time", or are there more conditions possible?

One problem: if we store it in user_id, we will have to start purging that field after 90 days, whereas we currently do not. We should instead add an anonymous_user_token field or similar to EditAttemptStep.

How's the purge configured? I don't know anything about the mechanism for it -- is it raw "empty this field after X time", or are there more conditions possible?

It's just a straight "empty this field after 90 days"—the only other condition possible is "salt-and-hash" but I wouldn't want to do that here since it would mess with the registered user IDs. I just added anonymous_user_token to the schema, and I suggest we use that.

I'm not opposed to sticking the token in an existing field, but it would have to be a field that doesn't appear on the whitelist, and I don't see any good candidates.

Change 517369 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/WikimediaEvents@master] Bump EditAttemptStep schema

https://gerrit.wikimedia.org/r/517369

Change 517380 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/MobileFrontend@master] Allow the default mobile editor to be configured

https://gerrit.wikimedia.org/r/517380

Change 517369 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Bump EditAttemptStep schema

https://gerrit.wikimedia.org/r/517369

Change 517380 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] Allow the default mobile editor to be configured

https://gerrit.wikimedia.org/r/517380

Testing this patch beyond the superficial "it doesn't break the existing behavior" requires that you set the $wgMFDefaultEditor config variable, to one of source, visual, preference, or abtest. The first two are obvious, preference means it should obey your user's desktop preference, and abtest should do a 50/50 split which is persistent for a given user (including anonymous users, until cookies are cleared). Super-deep validation includes making sure that abtest causes eventlogging to include the anonymous_user_token and bucket fields with appropriate values.

Change 518757 had a related patch set uploaded (by DLynch; owner: DLynch):
[VisualEditor/VisualEditor@master] Enhanced instrumentation for context items and inspectors

https://gerrit.wikimedia.org/r/518757

Change 518791 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/MobileFrontend@master] VE-as-default abtest should only apply to users with <100 edits

https://gerrit.wikimedia.org/r/518791

Change 518791 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] VE-as-default abtest should only apply to users with <100 edits

https://gerrit.wikimedia.org/r/518791

Change 518757 had a related patch set uploaded (by DLynch; owner: DLynch):
[VisualEditor/VisualEditor@master] Enhanced instrumentation for context items and inspectors

https://gerrit.wikimedia.org/r/518757

This patch is actually for T221252: Enhance instrumentation for context items and inspectors.

JTannerWMF subscribed.

We received Legal approval. Leighanna's response: It sounds like you've come up with a good, privacy-protective experiment! The plan for implementing and retaining the data from anonymous_user_token sounds good to me. You're good to go.

kzimmerman moved this task from Next Up to Doing on the Product-Analytics board.
kzimmerman subscribed.

This has been done; we're creating a new task to document the measurement plan