Page MenuHomePhabricator

Add DiscussionTools a/b test bucket information to events from VisualEditor and WikiEditor.
Closed, ResolvedPublic

Description

This task is about making it possible for us to identify any non-discussion tool EditAttemptStep events by users in the AB test.

More context: T268193#6781374.

Requirements

  • Be able to distinguish between A/B test and non-A/B test events within EditAttemptStep
  • Within the Reply Tool A/B test, be able to distinguish between Discussion Tool and non-Discussion Tool events.
  • Within the Reply Tool A/B test, be able to distinguish all events logged for the control group and the treatment group (This would include both Discussion Tools and VisualEditor/WikiEditor events)

Deployment

  • @ppelberg to document whether the patch to implement the ===Requirements above needs to be deployed via a backport window and if so, the date on which this would ideally happen.

Done

  • ===Requirements section above is complete
  • A patch has been deployed that meets what's described in the ===Requirements section above
  • @MNeisler has verified data is being logged as we expect.

Event Timeline

DLynch renamed this task from Add event to be able to identify A/B test events within EditAttemptStep to Add DiscussionTools a/b test bucket information to events from VisualEditor and WikiEditor.Jan 27 2021, 6:35 PM
ppelberg renamed this task from Add DiscussionTools a/b test bucket information to events from VisualEditor and WikiEditor to Add event to be able to identify non-A/B test events within EditAttemptStep.Jan 27 2021, 6:37 PM
ppelberg reassigned this task from DLynch to MNeisler.
ppelberg updated the task description. (Show Details)
ppelberg renamed this task from Add event to be able to identify non-A/B test events within EditAttemptStep to Add event to be able to identify non-Discussion Tool A/B test events within EditAttemptStep.Jan 27 2021, 8:15 PM
ppelberg reassigned this task from MNeisler to DLynch.
ppelberg updated the task description. (Show Details)
ppelberg moved this task from Backlog to Triaged on the DiscussionTools board.

META
@DLynch, this task should be ready to be worked on. @MNeisler and I just met and populated the task description's ===Requirements section.

DLynch renamed this task from Add event to be able to identify non-Discussion Tool A/B test events within EditAttemptStep to Add DiscussionTools a/b test bucket information to events from VisualEditor and WikiEditor..Jan 27 2021, 9:30 PM

Change 660074 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/WikiEditor@master] Log the DiscussionTools a/b test bucket for relevant schemas

https://gerrit.wikimedia.org/r/660074

Change 660076 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/VisualEditor@master] Log the DiscussionTools a/b test bucket for relevant schemas

https://gerrit.wikimedia.org/r/660076

Change 660076 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Log the DiscussionTools a/b test bucket for relevant schemas

https://gerrit.wikimedia.org/r/660076

Change 660074 merged by jenkins-bot:
[mediawiki/extensions/WikiEditor@master] Log the DiscussionTools a/b test bucket for relevant schemas

https://gerrit.wikimedia.org/r/660074

It looks like the necessary patches have been merged but this still blocked on 1.36.0-wmf.29 being fully rolled out.

I'll plan to prioritize this task as soon as it's unblocked and new events are logged.

Change 663403 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/VisualEditor@wmf/1.36.0-wmf.27] Log the DiscussionTools a/b test bucket for relevant schemas

https://gerrit.wikimedia.org/r/663403

Change 663404 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/WikiEditor@wmf/1.36.0-wmf.27] Log the DiscussionTools a/b test bucket for relevant schemas

https://gerrit.wikimedia.org/r/663404

Change 663403 merged by Urbanecm:
[mediawiki/extensions/VisualEditor@wmf/1.36.0-wmf.27] Log the DiscussionTools a/b test bucket for relevant schemas

https://gerrit.wikimedia.org/r/663403

Change 663404 merged by Urbanecm:
[mediawiki/extensions/WikiEditor@wmf/1.36.0-wmf.27] Log the DiscussionTools a/b test bucket for relevant schemas

https://gerrit.wikimedia.org/r/663404

Mentioned in SAL (#wikimedia-operations) [2021-02-12T01:01:34Z] <urbanecm@deploy1001> Synchronized php-1.36.0-wmf.27/extensions/VisualEditor/: rEVEDc86cd00076c9: rEWEDde4a562d3bae: VE backports (T273096) (duration: 01m 15s)

I completed a post-deployment QA of these instrumentation changes and confirmed that the instrumentation changes meet the requirements identified above needed for me to complete the analysis described in T252057. I just have a few remaining open questions, which are noted below along with a summary of my findings:

CONFIRMED/PASSED CHECKS

  • We can distinguish events in the AB test (indicated by event.bucket = 'control' or event.bucket = 'test') and events not in the AB test (indicated by event.bucket = 'NULL').
  • We can distinguish between events logged to the Test versus Control groups within the AB test with the new instrumentation. We are logging a reasonable amount of attempts in each group among the expected experiment population.
  • Control and Test events are only logged for the wikis included in the AB test as expected.
  • We can distinguish between discussion tool events (event.type = 'discussiontools') and non-discussion tool events (event.type = 'page') for both the Test and Control groups. There are very few discussion tool attempts (18 attempts from 8 users) in the control group. This is expected as these events could only be recorded for users that manually enabled DiscussionTools.
  • Edit attempts by users in the AB test were recorded on 21 of the 22 participating wikis. A review of attempts by each participating wikis indicates that there have not yet been any edit attempts by a user included in the AB test on Amharic Wikipedia. As a smaller wiki, this seems possible if no user that met the AB test criteria has made an edit attempt.
  • Re-checked and confirmed that buckets are balanced for each participating wiki.
  • Confirmed that discussion tool events are oversampled starting on 12 Feb 2021 following the deployment of the config change. The increase in discussion tool events fits with a sampling rate change of 100%.

OPEN QUESTIONS:

  • About 63% of logged-in users on the participating wikis that have attempted an edit since the AB test deployment were not included in the AB test, which indicates that the targeting for the AB test is pretty restrictive. Based on the AB test criteria, excluded logged-in users would include anyone that has used the reply tool before (defined as people whose discussiontools-editmode preference is not empty).
    • UPDATE: I spoke with @ppelberg regarding this finding yesterday. Per our discussions, we decided to leave the AB test criteria as is. I believe we should have sufficient data to complete the analysis in 2 to 3 weeks based on the current rate of daily AB events logged and while the criteria may be restrictive, it does ensure that we are not including any users that have used the Reply tool before, which is important for us to be able to accurately compare the two test populations. If needed, we can start the analysis a few weeks later than originally planned (25-Feb) to collect more data.
  • There appear to be close to the same number of non-discussion tool full page edit attempts (event.integration = 'page', event.init_type = 'page' ,event.action = 'init') to talk pages in both the control and test groups (see table below). I would expect there to be far fewer number of non-discussion tool edit attempts in the test group compared to the control group (especially on talk pages) as they are shown the reply tool as default. @DLynch - Any ideas as to why this might be?

Edit Attempts by users in AB Test on Talk Pages by Event Type and Experiment Group
(Data reflects events logged from 12 February 2020 through 17 February 2020 across all participating AB test wikis)

event_typeexperiment_group# of distinct usersnumber of talk page edit attempts
discussiontoolscontrol49
discussiontoolstest7621647
pagecontrol171241
pagetest178234

Please see my QA notebook for further details. Checks also documented in the QA doc

I would expect there to be far fewer number of non-discussion tool edit attempts in the test group compared to the control group (especially on talk pages) as they are shown the reply tool as default. @DLynch - Any ideas as to why this might be?

Speculating here, but this could just be people that aren't noticing the reply link. Someone who's used to replying to comments via the source editor might be sticking with what they know, rather than poking at new things. In that case, the fairly even split between control and test would be consistent with a user-type who's just not paying attention to the new functionality. After all, being assigned to the test group doesn't stop any existing methods of replying from working, or even interfere with them at all.

An alternative could be that there's some way to view a comment and navigate to the source editor that we're not considering that's fairly prevalent, and in which we're not presenting the tool to the user. Further alternative could be if the tool is not functioning for a large group, I guess.

There being a lot more distinct users causing discussiontools events would be consistent with our thesis that the new tool is a lot easier to use, though I wasn't expecting it to be quite so unbalanced.

Thanks @DLynch.

After all, being assigned to the test group doesn't stop any existing methods of replying from working, or even interfere with them at all.

A little unique for an AB test but this might actually provide an interesting data point to review the impact of displaying the new reply link on the use of the new link vs existing methods.

@ppelberg -- I re-checked the data and I don't see any indication of a bug in the instrumentation changes or bucketing implementation that might lead to the trends we are seeing so I think we can resolve this task pending your review. Additional data and analysis in T252057 will hopefully help provide some insight into what might be leading some users in the test group to continue to use existing methods.

Reassigning this task to you for sign-off but let me know if you have any questions.

The findings you shared in T273096#6842281 look good to me, @MNeisler – thank you for bringing this all together.


A resulting question regarding there being, "...close to the same number of non-discussion tool full page edit attempts...to talk pages in both the control and test groups...":

Megan: in the context of EditAttemptStep does page include edit attempts initiated via section edit links? Schema:EditAttemptStep leads me to think "yes," tho I wanted to be sure.

Megan: in the context of EditAttemptStep does page include edit attempts initiated via section edit links? Schema:EditAttemptStep leads me to think "yes," tho I wanted to be sure.

It's the integration, so yes. (You could distinguish section ones from init_type. But only for source mode, because we still haven't released visual section editing anywhere.)

Megan: in the context of EditAttemptStep does page include edit attempts initiated via section edit links? Schema:EditAttemptStep leads me to think "yes," tho I wanted to be sure.

It's the integration, so yes. (You could distinguish section ones from init_type. But only for source mode, because we still haven't released visual section editing anywhere.)

Yes, that's correct. The event.integration = 'page' includes both edit attempts via section links and full page links. However, I realized the data reflected in T273096#6842281 ( Edit Attempts by users in AB Test on Talk Pages by Event Type and Experiment Group) was filtered to only include init_type = 'page` (sorry for not clarifying that in my comments).

I reran the query to show the breakdown between section and full page edits for each experiment group and integration type. See updated numbers below.

Edit Attempts by users in AB Test on Talk Pages by Event Type and Experiment Group
(Data reflects events logged from 12 February 2021 through 24 February 2021 across all participating AB test wikis)

integrationinit_typeexperiment_group# of distinct usersnumber of talk page edit attempts
discussiontoolspagecontrol510
discussiontoolspagetest7871722
pagepagecontrol201292
pagepagetest205270
pagesectioncontrol176255
pagesectiontest143187

We're still seeing close to the same number of non-discussion tool edit attempts logged for both the control and test groups for edit attempts initiated for the whole talk page. When looking only at edit attempts initiated for a section of a talk page, we see a little more of the expected discrepancy in non-discussion tool attempts between the test and control groups. There were 255 attempts in the control and 187 attempts in the test group .