Maniphest T372015

Conduct analysis for Alt Text experiment 15 days and 30 days after experiment start
Open, LowPublic
Actions

Assigned To

None

Authored By

	HNordeenWMF
	Aug 7 2024, 8:49 PM

Description

Background

Release date: September 5th, 2024 - 7.5.8 (3979) and onwards
15 Days- September 20, 2024
30 Days- October 5, 2024
60 Days- Nov 4, 2024

The Task

Compare results for experiment groups to control group data
Visualize and present the data in a way that is easily understandable to the team

Requirements

The data should be based on the metrics in the Epic

At 15 days:

Check metric-specific leading indicators:

100 edits with alt text values, from at least 25 unique editors. At least 25 edits are from newer editors
More than 15 unique editors have been assigned to each experiment group
70% task acceptance rate for group B, at least 10% acceptance rate for group C (# of people who enter the flow / impressions of prompt)
Revert rate for newer editors edits in any single group does not exceed 18%
Add data into Grading Sheet

At 30 days (September 5 - October 5, 2024)

Measure KRs that require control group:

Edit return rate of editors in group B or C who have received an Alt text prompt does not differ from controls by more than 10%
Revert rate for edits from experiment groups does not exceed controls by more than 5 percentage points

Measure curiosities:

What is the most common reason that users decide not to act on the prompt? (Survey responses)
Did the overall constructive activation rate in the iOS app increase when we made Image recommendations available to brand new editors?

Add data into Grading Sheet
Pull matching data to this tab from image recs edits during same time period, for es, pt, zh, and fr (September 5 - October 5, 2024)

At 60 days:

Key Indicators

60% of group B editors publish an additional edit with alt text for the image they were prompted on
4% of group C users add alt text when prompted after editing an article
Of group B editors who saw treatment and then make a subsequent image recommendations edit in the next 15 days, 25% add alt-text as a part of that edit
200 images are improved with Alt text, by at least 50 unique editors
Add data into Grading Sheet
Pull matching data to this tab from image recs edits during same time period, for es, pt, zh, and fr (September 5 - November 4, 2024)

Decision Matrix for next steps:

If 71% of edits are scored a 3 or higher we will scale the feature. If less than 70% of edits are scored a 3 or higher we will improve guidance or use AI to better assist users.
If quality scores for newer editors are more than 50% worse than quality scores for experienced editors, we will not recommend this task be available to newer editors.
If we see at least 60% say they would use feature that provided a feed of images in need of alt text, then we will have the confidence to pursue a feed of alt-text suggested edits
If 60% or more of respondents say they would be interested in similar edit notifications for articles they are working on, and 60% of respondents are satisfied with the feature (Group C survey responses), we share this information and consider future edit prompts.

Guardrails

Edit return rate of editors in group B or C who have received an Alt text prompt does not differ from controls by more than 10%
Revert rate for edits from experiment groups does not exceed controls by more than 5 percentage points
Human-graded* or actual revert rate for newer editors in experiment groups does not exceed 18%
Alt text task completion rate for newer editors is above 25% (Completion rate = number of alt text edits published / those who said “yes” to the prompt and started the flow)

Curiosities (nice to have)

Did the overall constructive activation rate in the iOS app increase when we made Image recommendations available to brand new editors?
How does the task completion rate, return rate, and revert rate for newer editors’ alt text edits compare with that of experienced editors? With that of comparable rates from Growth suggested edits?
How does the human-graded revert rate compare to Android's Image Captions Suggested Edit?
Is there a difference in Number of alt text edits & Unique editors by language and geography? (For example, breaking down edits from Latin America vs Europe for Spanish)
What is the most common reason that users decide not to act on the prompt?

*Note: for the quality scores and human-graded revert scores, we will partner with an accessibility organization who will be reviewing and grading alt text.

{1} Definition of newer editors: Editors who had fewer than 10 edits on that wiki they are currently editing at the point they entered the experiment

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T357437 [Epic] Alt-Text Suggested Edit Experiment on iOS
		Open		None	T372015 Conduct analysis for Alt Text experiment 15 days and 30 days after experiment start