Background
The Android team is bringing Image Recommendations to the app.
We want to measure:
Metric specific Leading Indicators: Indicators to be captured after 15 days
Image rejection rate does not exceed 29%
Edit over-acceptance rate (never skip or reject recommended images) does not exceed 35%
Task completion rate is not below 30%
Revert rate does not exceed 18%
Validation
- KR 1.1: 2000 articles have images in a 30 day period
- KR 1.2: Average at least 8 edits per day per unique user
- KR 1.3: 15% of eligible Suggested Editors try image recommendations task
- by app_install_id, just by username in mediawiki_history)
- KR 1.4: 70% of those users complete the task again on a separate day in a 15 day period
- KR 1.5: Reject and Accept rate does not deviate from Mobile Web or MVP by more than 10% (redundant metrics from
- KR 1.6: DAU of Suggested Edits increase overall
Guardrails
- KR 1.1: Feature does not worsen gender or geographic bias*
- KR 1.2: Less than 5% of users report NSFW or offensive content
- KR 1.3: Users spend at least 10s evaluating a task before publishing it
- KR 1.4: Bounce rate does not exceed 50%
- Bounce rate defined as users that click Yes then abandon the flow before publishing
- KR 1.5: At least a 35% task completion rate
- Defined as users that click on Add an image as a task, and actually clicks Yes, No or Not sure (interact with the feature)
- KR 1.6: Revert rate does not exceed 18%
Curiosities (nice to have)
KR 1.1: Do these numbers differ by language or user tenure?
KR 1.2: If this is a user’s first suggested edit, do they go on to try others?
KR 1.4 At what point in workflow are most frequent dropoff events?
KR 1.3: Feature perception by geographically underrepresented groups on large language wikis
Target Quant Regions and Languages
- Spanish Wikipedia
- Portuguese Wikipedia
- Persian Wikipedia
- Hindi Wikipedia
Target Qualitative Audience
- Spanish and Portuguese speakers in LATAM and Caribbean countries
- Hindi speakers in India
- Persian speakers across diaspora
- No more than 40% of our testers should identify as male
Task
- @SNowick_WMF to create schema docs for new image recommendation schema
- Engineers to wire up instrumentation based on schema doc from @SNowick_WMF
