Page MenuHomePhabricator

Run A/B test to evaluate impact of Reply tool
Open, MediumPublic

Description

This test is intended to help us understand what impact the Reply Tool is having on Junior Contributors' likelihood to start (activation) and continue (retention) participating on Wikipedia talk pages.

Decision to be made

The decision this analysis is intended to help us make:
Should the Reply tool be offered to all people, at all wikis, as an opt-out user preference?

Hypotheses

To help evaluate the impact of the Reply tool, we would like to analyze whether adding a more intuitive workflow for replying to specific comments to Wikipedia talk pages:

IDHypothesisMetric(s) for evaluation
KPI...causes a greater percentage of Junior Contributors to publish the comments they start without a significant increase in disruption. (see "Guardrail" below)Comment completion rate as defined by the number of people who click the [ reply ] link (action = init), what % of people successfully publish the comment they were drafting (action = saveSuccess).
Guardrail...does not cause a significant increase in the number of disruptive edits being made to talk pagesThe number of edits made to talk pages that are reverted within 48 hours. The number of editors who are blocked after making an edit to a talk page.
Curiosity #1...causes a greater number of Junior Contributors to start participating productively on talk pages.The number of distinct Junior Contributors who make at least one edit to a page in a talk namespace that is not reverted within 48 hours.
Curiosity #2...causes a greater percentage of Junior Contributors continue participating productively on talk pages.The percentage of Junior Contributors who who make at least one edit to a page in a talk namespace that is not reverted within 48 hours in each of the following time intervals: 2 to 7 days after making their edit (read: within the first week), 8 to 14 days after making their first edit (read: within the second week), and 15 to 30 days after making their first edit (read: within the third or fourth weeks).

Decision matrix

IDScenarioPlan of action
1.People are "meaningfully" more likely to publish edits using the Reply Tool than they are using full-page editingContinue with plans to make the Reply Tool available at all Wikipedias, by default. See T269062 for more detail.
2.People are "meaningfully" less likely to publish edits using the Reply Tool than they are using full-page editingInvestigate where within the Reply Tool comment funnel people are dropping off and what could be contributing to this drop-off. In parallel, we will pause plans to make the Reply Tool available at all Wikipedias by default.
3.People are as likely to publish edits using the Reply Tool as they are using full-page editingContinue with plans to offer features as opt-out preference at all Wikipedias considering we have meaningful qualitative feedback and quantitative data that suggests the tool is leading people to find participating on talk pages easier / more efficient.[ii]

Open questions

  • 1. Should edits to non-talk namespace pages be included in this analysis?
  • 2. What wikis should be included in the test? See: T267379.

Done

  • A report is published that evaluates the ===Hypotheses listed above

i. Editor experience buckets

  • Logged out
  • 0 cumulative edits
  • 1-4 cumulative edits
  • 5-99 cumulative edits
  • 100-999 cumulative edits
  • 1000+ cumulative edits

ii. An example of said "quantitative data": T247139

Related Objects

Event Timeline

Task description update
I've updated ===Hypotheses section to the task description which contains the metrics we will use to compare the two test groups and by extension, determine the impact the Reply Tool is having on Junior Contributor activation and retention.

Note: the above is the outcome of the conversation @MNeisler and I had on 4-Nov wherein we revisited the Reply Tool measurement plan and identified the metrics we will prioritize as part of this A/B test.

Task description update
I've updated the task description to reflect the updates to the test KPI @MNeisler and I decided upon during the meeting we had on 2-December.

ppelberg updated the task description. (Show Details)
ppelberg updated the task description. (Show Details)

Deployment update
The A/B test officially started today, 11-February-2021. [i]

This means the analysis can "start" as early as 25-February per the conversation @MNeisler and I had yesterday (10-February).


i. T273554#6825381

I read the A/B test has officially started. The task T273406 hasn't been updated yet. Is it okay if I inform Dutch Wikipedia?

I read the A/B test has officially started...Is it okay if I inform Dutch Wikipedia?

I'm sorry for the delayed response, @AdHuikeshoven. Yes, it is okay to inform Dutch Wikipedia.

Note: it looks like @Whatamidoing-WMF has already made an announcement at nl.wiki per T273406#6827497.

Question: should the KPI be “percentage of people” or “percentage of edits/posts”? If a person makes a mixture of successful and unsuccessful posts, do they get counted as a success or a failure overall?

You could do something funky like number of people weighted by their personal success rate. Person A posts 30 comments, all successful, is scored as a 1; person B makes two successful posts also scores 1; person C has one success and one fail scores 0.5. (It's people-focussed, and avoids the problem with edit counts where prolific commenters would skew the results.)

Or maybe you’re only interested in their first n attempts before they learn the ropes? But if someone keeps trying and gets better with time (instead of giving up) then that retention factor is important to count in.

Meta
Per the conversation, @MNeisler and I had today, we are going to break this analysis into two parts:

  1. Report on the KPIs
    • Components: KPI and Guardrail metrics defined in the Hypotheses section of the task description.
  2. Full analysis
    • Components: Curiosity metrics defined in the Hypotheses section of the task description.
MNeisler triaged this task as Medium priority.Tue, Mar 2, 9:34 PM
MNeisler moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.