Page MenuHomePhabricator

[Check] Offer volunteers feedback about the reliability of the source they are attempting to add
Open, Needs TriagePublic

Description

This is a meta-task to cover the work involved with equipping newcomers with the information and tools they need to decide how reliable experienced volunteers are likely to perceive the sourcing they're considering adding.

Background

As noted in T265163, inexperienced editors often make edits that defy the project they are editing's policies and guidelines. One such policy we see new editors break (knowingly and unknowingly) is not citing reliable sources. [i]

This task is about equipping volunteers with the information and actions they need to decide if and how they will proceed with citing the source they are attempting/considering adding to Wikipedia.

Ultimately, this fits within the larger effort to help newcomers make edits they are proud of and experienced volunteers consider useful and aligned with Wikipedia's larger objectives.

Components

Consider these components to be an evolving list...

ComponentDescriptionTicket(s)Notes
0.Introduce the concept of reliable sources to people adding a reference to Wikipedia for the first timeT350322
1.A way for volunteers, on a per project basis, to define, in a machine-readable way, what sources they reached consensus on being reliable and unreliable.T337431
2.A way for volunteers to add to and edit the "list" described in "1."T330112
3.A way for the editing interface to check a source someone is attempting to add "against" the "list" described in "1."T349261
4.A way to make the person editing aware, in real-time, when they have added a source that defies the project's policiesT347531
5.Information that helps newcomers decide how likely experienced volunteers are to perceive the source they're adding as reliable in this specific contextT35031
6.A way for volunteers to audit/see the edits reference reliability feedback is shown withinT350622
7.A way for volunteers to report issues with the reference reliability feedback they see Edit Check providingT343168
8.A way for experienced volunteers to define what message people are presented with when attempting to cite a source a project has developed a reliability-related consensus aroundT337431

Links

Research

Relevant conversations

Tools

Source lists
Listed at https://www.wikidata.org/wiki/Q59821108


i. https://en.wikipedia.org/w/index.php?title=User_talk:ENieves1&type=revision&diff=1009512219&oldid=1009437193&diffmode=source

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
ResolvedSpikenayoub
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Opennayoub
OpenNone
OpenNone
OpenEsanders
OpenNone
OpenMNeisler
ResolvedMNeisler
Resolved ppelberg
OpenNone
OpenNone
OpenNone
ResolvedMNeisler
ResolvedRyasmeen
ResolvedEsanders
ResolvedEsanders
OpenNone
OpenDLynch
OpenNone
OpenNone
OpenTrizek-WMF
OpenNone
OpenNone

Event Timeline

Thank you @Trizek-WMF and @Whatamidoing-WMF for the links you shared during today's Monday morning meeting. I've added them to the task description'sLinks` section.

Samwalton9-WMF renamed this task from Make editors aware when they are attempting to add unrelaible sources to an article. to Make editors aware when they are attempting to add unreliable sources to an article..Mar 9 2021, 1:34 PM

A more fundamental learning might be that citations are needed at all. We could also consider alerting editors when they add new content but don't add a citation. The research team developed a Citation Needed model that could do the heavy lifting on understanding whether a citation is needed for a given piece of text.

Communities already have two tools to block some links: the local blocklist or AbuseFilter. At the moment, it is not possible to know if your link will be blocked before you hit "publish", and, when your edit is blocked, the faulty link is not highlighted.

The spam blacklist tells you which link caused your edit to be prevented, but doesn't show you where it is in the article, and it's buried under various other text:

image.png (231×589 px, 23 KB)

ppelberg added a subscriber: MMiller_WMF.

Task description udpate

A more fundamental learning might be that citations are needed at all. We could also consider alerting editors when they add new content but don't add a citation.

Great spot and agreed.

The research team developed a Citation Needed model that could do the heavy lifting on understanding whether a citation is needed for a given piece of text.

@Samwalton9 this is the first time I'm hearing of this...can you confirm this is the research you were referring to https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements ?

@Samwalton9 this is the first time I'm hearing of this...can you confirm this is the research you were referring to https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements ?

That's the one :)

Regarding the machine-readable storing of reliable/unreliable classifications, I have a couple of thoughts. First, @Newslinger has been working on a tool to take the English Wikipedia's table and turn it into a more usable format - looks like you can read more about that here.

Second, I've been feeling cautious about the idea of telling users explicitly what is or isn't reliable as they edit. On the one hand it seems like an obvious thing to do - the community already has this table with encoded community conventions which we could make available to new users more readily. On the other hand we would risk strengthening Wikipedia projects' biases around sourcing. We might want to make sure we design such a feature in a way that doesn't actively discourage adding sources which aren't in the list yet (i.e. don't train editors to look for a sign of approval that a source is definitely reliable). We already have a problem with editors not understanding what sources are reliable in different languages/countries/contexts (see an effort to alleviate this issue at Wikipedia:New page patrol source guide. Maybe this is an inherent tension with attempting to codify fuzzy norms and practices.

Thanks for the ping, @Samwalton9.

Here is an example of what the machine-readable data for the English Wikipedia's perennial sources list looks like in JSON form:

https://api.sourceror.org/v1/all_entries

The data is scraped and parsed from the perennial sources list. This format can be adapted for equivalent source lists on other Wikipedias. The Wikidata entry links to several non-English lists, two of which can be parsed to a machine-readable format:

On the English Wikipedia, the AbuseFilter and blocklist features are able to handle some use cases for this data (deprecation and blacklisting, respectively). However, there are limitations that reduce the effectiveness and hinder community acceptance of these technical measures:

  • As of February 2021, the Wikipedia apps for Android and iOS are not able to display edit filters, according to the table in this noticeboard discussion.
  • There is currently no way to apply blocklist entries to a selection of pages. All patterns on the blocklist apply to all pages on the project.
  • Neither the AbuseFilter nor the blocklist provides a simple way to target or ignore content additions to particular sections of a page.
  • As Samwalton9 mentioned, the messaging associated with these technical measures could be improved. The community can handle some of this, but it would be helpful to have more data available that could be incorporated in the template messages. For example, when a user adds a link that is either deprecated or blacklisted, the error or warning message should ideally show the paragraph surrounding the link.

Peter and I have been discussing using the Spam blacklist as a starting point for this, as entries there are more easily categorisable as obviously undesirable, whereas the perennial sources list has many entries with edge cases and nuances and is as-yet a few steps further removed from something the editor could easily parse (especially so because it doesn't exist on all Wikis). I did some investigation today looking through the English Wikipedia blacklist log and found that entries can be broadly categorised into the following (often overlapping) buckets: Spam, URL shorteners, and unreliable sources. The following is some notes on how ~200 randomly selected log entries broke down:

  • Spam (40%): These were entries clearly designed to lead readers to some website selling a product, hosting a suspicious file, or otherwise of no encyclopedic value whatsoever.
  • URL shorteners (35%): These are entries which introduced links to websites like bit.ly, youtu.be, or Google Amp. These are on the Spam Blacklist because they can disguise their destination, but I was surprised at the volume of hits this category receives. It's worth pointing out that many of these links might have been to spam sites, but I'm sure many were good faith edits.
  • Unreliable sources (25%): These links appeared like they could be useful references for articles. While I'm sure many have been spammed or aren't even remotely reliable, I could imagine most of these link additions having been made in good faith by a new user.

I'm posting this here because I think this backs up the idea of the spam blacklist being a sensible place to start - if 90%+ of hits were clearly from spam bots I might have suggested another approach, but as much as 60% of spam blacklist hits are at least potentially being made in good faith and many more entries on the blacklist are to unreliable sources than I had previously thought.

We could imagine three lanes of guidance based on this categorisation, explaining that the user's edit won't successfully be saved, and then providing guidance to move away from a URL shortener, use a more reliable source, or to check that a link isn't complete garbage. This is prompting me to think about how we could facilitate categorisation on the Spam blacklist to match entries to these specific guidance paths; I'm not sure how that would work right now.

ppelberg updated the task description. (Show Details)
ppelberg added a subscriber: nayoub.
ppelberg updated the task description. (Show Details)
ppelberg added subscribers: Pablo, Isaac.

It would be nice to have the "Perennial sources" list under the community config page so it can be turned off on the small and medium sized wikis, most of which do not have one.

The only global rule I am aware of is that on the blacklist are blog-sites, sites registered in whois with the same name as the article (especially websites of the company of an article) and of course the local and global spam blacklist.

ppelberg renamed this task from Make editors aware when they are attempting to add unreliable sources to an article. to Offer volunteers feedback about the reliability of the source they are attempting to add.Nov 6 2023, 8:09 PM
ppelberg updated the task description. (Show Details)
ppelberg updated the task description. (Show Details)
ppelberg added a project: Editing-team.
ppelberg moved this task from Untriaged to Larger Strategic Things on the Editing-team board.
ppelberg renamed this task from Offer volunteers feedback about the reliability of the source they are attempting to add to [Check] Offer volunteers feedback about the reliability of the source they are attempting to add.Nov 14 2023, 8:17 PM