Page MenuHomePhabricator

[Analytics][Request] TechWish Annual Goals newcomer retention
Open, Needs TriagePublic

Description

Wikipedia Analytics Request

Purpose

Please provide as much context as possible as well as what the produced insights or services will be used for.

The TechWish team is targeting the following behavior change in 2026:

  • We want to see an increase in newcomers (< 100 edits) returning to make constructive reference edits. If we see an increase during the period in which they are still defined as newcomers, we hypothesise that that the improvements we make are helping them understand and learn how to use the tooling for citations. This also means that when they grow into more experienced users (as measured by the number of edits) they are equipped with the means to continue creating constructive edits and are supported by the tooling.

Constructive references edits is defined as published edits that include a reference and that are not reverted within 48 hours of being published.

The success metric for this is:

The percentage of newcomers that have successfully published their first reference using VisualEditor (with or without ReferenceCheck), that return to add another reference without ReferenceCheck within y days, increases by x%.

Please see notes from chat with Megan Neisler.

Questions to be answered

  1. What is the rate (within how many days) at which newcomers return to create an edit?
  2. What is the rate (within how many days) at which newcomers return to create an edit with references?
  3. What is the percentage of newcomers that return to add another reference?
  4. What is a plausible increase that we can work towards in 2026? Determining the x% increase.
  5. What is the total number of newcomers

Desired Outputs

The desired outputs of this task are listed and confirmed as being finished below.

  • Superset report for de.wiki showing total number of newcomers overtime.
  • Superset report showing percentage of newcomers returning to add another reference within x days.

Deadline

Please make the time sensitivity of this request clear with a date that it should be completed by. If there is no specific date, then the task will be triaged based on its priority.

  • First round of results preferably by 30.11.2025 so that we can re-adjust plan if need be.
  • The data points have to be available beginning of Q1 2026 so that the numbers can be updated on meta.wikipedia.org.

Information below this point is filled out by the task assignee.

Assignee Planning

Sub Tasks

A full breakdown of the steps to complete this task.

  • Explore potentially useful tables (DataHub)
    • wmf.mediawiki_history
      • editcheck-newreference revision tag for all edits made using the visual editor to pages in the main namespace that involve an edit where people add a net new reference
      • revision_is_identity_reverted AND revision_seconds_to_identity_revert <= 172800 for reverted in 48 hrs (not a "constructive" edit)
      • event_user_revision_count < 100 for newcomer
    • research.mediawiki_content_diff - parse parent_revision_diff to identify URLs
    • canonical_data.wikis for deriving Wikipedias (database_code to join and database_group = 'wikipedia')
  • Derive base check metric to compare results against
  • Read related research
  • Check past work related to this process
  • Investigate other metadata that's easily accessible with the edits to see if potential breakdowns would be useful
    • No fields within wmf.mediawiki_history looked useful used with these fields, and the process is quite resource intensive, so I made the decision to stick with what we have
  • Finalize outputs for metrics data process (check with TechWish Product and Engineering)
  • Write needed create table queries
  • Transfer work into metrics HQL scripts
  • Write Airfow DAG and testing files
  • Test and deploy Airflow DAG

Estimation

Estimate: 5 days
Actual:

Data

The tables that will be referenced in this task.

  • wmf.mediawiki_history
  • canonical_data.wikis

Notes

Things that came up during the completion of this task, questions to be answered and follow up tasks.

  • Note

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
T409552 Add DAGs to compute wikipedia newcomer retentionrepos/data-engineering/airflow-dags!1865andrewtavis-wmdeT409552-wikipedia-newcomer-dagsmain
Customize query in GitLab

Event Timeline