Page MenuHomePhabricator

Surface Reference survival signal within VE
Open, Needs TriagePublic

Description

This is a meta-task to cover the work involved with equipping volunteers, across experience levels, with the information and tools they need to decide how reliable other volunteers are likely to consider source(s) they:

  1. Are attempting to add to Wikipedia
  2. Encounter while reading/patrolling Wikipedia

Risks/open questions

  • 1. How might we treat domains that have low survivability scores (e.g., YouTube, Facebook, Wikipedia itself) and can still be valid for specific references? via @Pablo.
    • For the use cases involving highlighting risky domains in existing revisions, it might be worth prioritizing recent revisions that have added a risky domain, since we may not want to flag references that were added a long time ago and have already survived several edits on the page.
  • 2. How might we craft the UX in such a way that encourages people to consider reference survivability as one input – among the many volunteers need – to decide whether a given reference warrant discussion/removal/etc.? via User:Kowal2701.

Background

As noted in T265163, inexperienced editors often make edits that defy the project they are editing's policies and guidelines. One such policy we see new editors break (knowingly and unknowingly) is not citing reliable sources. [i]

This task is about equipping volunteers with the information and actions they need to decide if and how they will:

  1. Proceed with citing the source they are attempting/considering adding to Wikipedia
  2. Engage with sources that are already published on Wikipedia

Ultimately, this fits within the larger effort to help newcomers make edits they are proud of and experienced volunteers consider useful and aligned with Wikipedia's larger objectives.


Links

Models

Research

Relevant conversations

Tools

image.png (1×900 px, 103 KB)

Source lists
Listed at https://www.wikidata.org/wiki/Q59821108


i. https://en.wikipedia.org/w/index.php?title=User_talk:ENieves1&type=revision&diff=1009512219&oldid=1009437193&diffmode=source

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
ResolvedSpikenayoub
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Resolvednayoub
ResolvedNone
Resolvedppelberg
ResolvedEsanders
OpenNone
OpenNone
ResolvedMNeisler
Resolvedppelberg
OpenNone
ResolvedNone
OpenNone
ResolvedMNeisler
ResolvedRyasmeen
ResolvedEsanders
ResolvedEsanders
OpenNone
ResolvedDLynch
OpenNone
Resolvedppelberg
OpenNone
OpenNone
OpenNone
ResolvedDLynch
OpenNone
OpenNone
OpenNone
OpenEsanders

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ppelberg updated the task description. (Show Details)
ppelberg added a subscriber: nayoub.
ppelberg updated the task description. (Show Details)
ppelberg added subscribers: Pablo, Isaac.
ppelberg updated the task description. (Show Details)
ppelberg updated the task description. (Show Details)

It would be nice to have the "Perennial sources" list under the community config page so it can be turned off on the small and medium sized wikis, most of which do not have one.

The only global rule I am aware of is that on the blacklist are blog-sites, sites registered in whois with the same name as the article (especially websites of the company of an article) and of course the local and global spam blacklist.

ppelberg renamed this task from Make editors aware when they are attempting to add unreliable sources to an article. to Offer volunteers feedback about the reliability of the source they are attempting to add.Nov 6 2023, 8:09 PM
ppelberg updated the task description. (Show Details)
ppelberg updated the task description. (Show Details)
ppelberg added a project: Editing-team.
ppelberg moved this task from Untriaged to Larger Strategic Things on the Editing-team board.
ppelberg renamed this task from Offer volunteers feedback about the reliability of the source they are attempting to add to [Check] Offer volunteers feedback about the reliability of the source they are attempting to add.Nov 14 2023, 8:17 PM
ppelberg updated the task description. (Show Details)

@ppelberg as I saw that you have added the language-agnostic reference risk model card, please find the datasets with the risk scores for each domain in each wiki at https://analytics.wikimedia.org/published/wmf-ml-models/reference-quality/reference-risk

Per offline discussion with @MMiller_WMF: maybe a first step here could be evaluating the extent to which the content someone is attempting to Wikipedia is supported by what's written in the source they're citing.

ppelberg renamed this task from [Check] Offer volunteers feedback about the reliability of the source they are attempting to add to [Ref. Reliability Check] Offer volunteers feedback about the reliability of the source they are attempting to add.May 30 2025, 3:58 AM
ppelberg renamed this task from [Ref. Reliability Check] Offer volunteers feedback about the reliability of the source they are attempting to add to Surface Reference Reliability signal within VE.Oct 31 2025, 5:45 PM
ppelberg updated the task description. (Show Details)

Jotting down some notes from offline discussion with @Pablo and @Sucheta-Salgaonkar-WMF about the Language-agnostic Reference Risk model:

  • The model is already available on LiftWing: https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_reference_risk_prediction#Anonymous_access.
  • At present, the model accepts a revisionID and lang (e.g. https://en.wikipedia.org/w/index.php?title=Lux_(Rosal%C3%ADa_album)&oldid=1322859357) and outputs the following:
    • ps_label_local: the status that web domain is the perennial source list of that wiki (if exists),
    • ps_label_enwiki: the status that web domain in the English Wikipedia perennial source list (if exists),
    • survival_ratio : using data from the model_version (currently 2024-11), the survival ratio of that web domain when used as a reference on that wiki (i.e., proportion of the number of edits the domain stayed on the page over the total number of edits since addition). Values range from 0 to 1 (the closer to 0, the riskier),
    • page_count: using data from the model_version (currently 2024-11), in how many pages that web domain has been used a reference on that wiki,
    • editors_count: using data from the model_version (currently 2024-11), how many editors have used that web domain as a reference on that wiki.
  • The model does not use ML but applies pre-computed scores of references in each wiki. Scores for the 2024-06 version can be found at https://analytics.wikimedia.org/published/wmf-ml-models/reference-quality/reference-risk
  • The model is language-agnostic
  • The model has not been tested with volunteers yet, the heuristic of using the survival of references comes from these ML experiments we ran https://arxiv.org/abs/2410.18803
  • The model is using pre-computed scores for any single source (data folder, e.g., these are scores for frwiki). Therefore, Edit Check or any other service interested in scores for a single source could already use them directly as well.
ppelberg renamed this task from Surface Reference Reliability signal within VE to Surface Reference survival signal within VE.Feb 20 2026, 7:35 PM
ppelberg updated the task description. (Show Details)
NOTE: the table below had originally appeared within the task description. Although, all of the rows within it are no longer relevant seeing as how steps "0.", "1.," and "2." are likely made obsolete/unnecessary by the https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Language-agnostic_reference_risk.

Components

Consider these components to be an evolving list...

ComponentDescriptionTicket(s)Notes
0.Introduce the concept of reliable sources to people adding a reference to Wikipedia for the first timeT350322
1.A way for volunteers, on a per project basis, to define, in a machine-readable way, what sources they reached consensus on being reliable and unreliable.T337431
2.A way for volunteers to add to and edit the "list" described in "1."T330112
3.A way for the editing interface to check a source someone is attempting to add "against" the "list" described in "1."T349261
4.A way to make the person editing aware, in real-time, when they have added a source that defies the project's policiesT347531
5.Information that helps newcomers decide how likely experienced volunteers are to perceive the source they're adding as reliable in this specific contextT350319
6.A way for volunteers to audit/see the edits reference reliability feedback is shown withinT350622
7.A way for volunteers to report issues with the reference reliability feedback they see Edit Check providingT343168
8.A way for experienced volunteers to define what message people are presented with when attempting to cite a source a project has developed a reliability-related consensus aroundT337431