Machine Assisted Article Descriptions Experiment Close Out Task
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	JTannerWMF
	May 23 2023, 1:20 AM

Description

Background
The team has reached 30 days of the experiment and has sufficient data to start determining next steps for the Machine Assisted Article Description experiment. We have deployed a feature flag and the feature is no longer viewable to users. This task is the parent for closing out the feature.

Close Tasks

Remove Model Card @Isaac
Update FAQ page to explain where the Machine Assisted Article Descriptions task went @JTannerWMF
Report findings for Machine Assisted Article Descriptions based on key indicators listed below @SNowick_WMF (June 6 2023 first draft)
Update ticket and project page with next steps @JTannerWMF (June 16 2023)

Key Indicators

Machine Assisted Article Descriptions has a higher accuracy score than human generated article descriptions and it holds up across mBART languages
- Share how this score changes when Modified is T vs Modified is F
80% of Machine Assisted Article Descriptions has a score of 3 or higher
Machine Assisted Article Descriptions accuracy score is not substantially lower for new users than experienced users (Experienced- 50+ edits vs less than 50)
Time spent on Machine Assisted Article Description is about the same as human generated article description
Beam one has a higher accuracy and selection score than beam two
We have a higher proportion of users publishing the machine suggestions without modifications than with modifications
People with the experiment treatment complete a higher number of descriptions in a day than those that did not

Guardrails

The revert rate will be higher for those that did not see machine assisted article descriptions than those that did not receive the experiment
The rewrite rate will be higher for those that see machine assisted article descriptions than those that did not receive the experiment
The revert and rewrite rate will be lower for those that modified machine assisted article descriptions than those that published the machine assisted article description without modifications or purely human generated article descriptions
Less than 2% of users used the report function to indicate we displayed inappropriate content

Additional Questions to Answers

What is the frequency of our experiment group (people exposed to machine assisted article descriptions), selecting the machine suggestion and hitting publish vs. modifying a suggestion vs. Typing out the suggestion
What feedback did we get through the reporting feature and what was the distribution of that feedback
How often are users coming back to try machine assisted article descriptions again in a 30 day period (1, 2, 7, 14) and does it differ from the users who did not get the experiment?
Mean vs. Median length of time to complete tasks by user tenure and response time under 5s

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		JTannerWMF	T316375 [EPIC] Machine Generated Article Descriptions
		Resolved		JTannerWMF	T337277 Machine Assisted Article Descriptions Experiment Close Out Task