Page MenuHomePhabricator

Provide a way to reject links (instead of skip)
Closed, ResolvedPublic

Description

There should be a way to reject a link in a way that will make sure that it is never proposed anymore to any user. Skip does not do that.

http://OAbot.org finds paywalled citations on Wikipedia and suggests an open link to add. This makes Wikipedia more open access, from the article context, through to the citations.

OAbot added 2,000 links through this year's Open Access Week, but has been a victim of its success: the suggestion queue is showing links that are low-quality and have already been 'skipped'. We need to implement a 'reject' option to keep the queue quality.

Let's start with the specs: The tool is written in Python with flask. It stores candidate edits as JSON files on disk. Proposed edits are represented by the TemplateEdit class declared in main.py.

Our highest priority task is To add a reject button in addition to [skip] and [add link]. For this, we need to decide how to represent rejected edits and choose a form of storage for the list of rejects; all matching edits in the store of proposed edits must be invalidated; and candidate edits discovered after that must be matched against this database to filter them out.

This is a remote-friendly project and those participating in the do-a-thon remotely can follow this github issue for updates, as well as the OAbot github codebase, and the phabricator workboard. People from around the world are already working on this tool, so feel free to jump in from wherever you are!

Tool: https://oabot.org
Codebase: https://github.com/dissemin/oabot
Workboard: https://phabricator.wikimedia.org/project/view/2734/
Documentation: https://en.wikipedia.org/wiki/Wikipedia:OABOT
Live chat: https://kiwiirc.com/client/irc.freenode.net/#wikipedia-library

Event Timeline

A3nm raised the priority of this task from Medium to High.Oct 26 2017, 8:35 AM
A3nm subscribed.

Marking this as high priority, because currently new users start on OAbot by seeing a succession of bad edits (in particular broken links). This is discouraging to them, and all the effort invested in checking and skipping them is lost (it doesn't benefit new users).

According to MediaWiki principles, it would be ideal if the list of rejected links were hosted on a wiki page, e.g. in the Project namespace of the target wiki or on Meta-Wiki. It can also be in JSON format. This assuming that it's not easier for you to handle a database with any manual updates or change tracking, of course.

+1.
Just in case useful, here's an example: [[Camillo Golgi]] gave me

Add link: https://digital.csic.es/bitstream/10261/62299/1/accesoRestringido.pdf

which just says

The full text of this item is not available because it has not been provided by its author yet; because there are copyright restrictions; or because a digital version does not exist

@Nemo_bis thanks! Yeah I get the idea, but there is quite a lot of work from such a prototype to something solid (like, in this state we would query the enwiki page every time we get a candidate edit)… Also, I think the simplest would be to store candidate edits in SQL directly so that we could filter them out easily when we add a new link to the blacklist.

@Quiddity : the fact that some links are bad is not a bug in itself: if every link was good, we would not ask people to review them in the first place, we would just add them all with a bot… human judgment is required and that's the point of this tool!

I would also suggest adding the possibility of having a field or a drop-down menu with the motivation for rejection, I think it would be useful to know.
Some reasons that I could think of:

  • broken link;
  • the link is pointing to a closed article or a paywall;
  • the link is pointing to a non-copyright-compliant source;
  • the addressed resource is different from the citation;
Ocaasi_WMF raised the priority of this task from High to Needs Triage.Nov 13 2017, 9:55 AM
Ocaasi_WMF updated the task description. (Show Details)

This bug is the focus of an OpenCon2017 doathon hackday project Monday November 13th (today)! People are coordinating (or landing for info) on this page: https://github.com/sparcopen/doathon/issues/46

@Pintoch If you are around today there might be some helpers who could use tips.

@Pintoch I'm getting no response from multiple computers when I click the "start editing" button. Server down? Code breakage?

@Pintoch I'm getting no response from multiple computers when I click the "start editing" button. Server down? Code breakage?

Yep, the issue was introduced when I deployed the new project structure - it's fixed now, I think.

Ocaasi_WMF raised the priority of this task from High to Needs Triage.Nov 13 2017, 12:17 PM

https://github.com/dissemin/oabot/commit/9f729396ce541147adc0e768dbc9c6350216e123 was merged in February and I see it on https://tools.wmflabs.org/oabot/ . I still recommend switching to an on-wiki blacklist, but such improvements can be filed separately.