Page MenuHomePhabricator

Newcomer tasks: investigate ability to identify articles with no outbound links
Closed, ResolvedPublic

Description

In T229430, we looked at which maintenance templates are available in our target wikis and how many articles are tagged with them. We have a couple concerns:

  • Although thousands of articles have maintenance templates, we're concerned that once narrowing to topics of interest, there won't be enough articles for newcomers to work on.
  • The target wikis don't all have the same templates. For instance, while they all have some copy edit templates, only Arabic uses a template to tag articles that need more outgoing links.

Because of those concerns, it is worth investigating our ability to supplement the maintenance templates. The one that is highest priority to investigate is the ability to detect which articles need more outgoing links, because we believe that is one of the best tasks for newcomers. This is the Arabic category storing articles that Arabic editors judge have this condition: https://ar.wikipedia.org/wiki/تصنيف:جميع_مقالات_النهاية_المسدودة

The most basic heuristic would just be to list those articles that have no internal wikilinks at all.

A more sophisticated approach might have rules like these:

  • They are greater than 100 characters.
  • They have no internal wikilinks in the text of the article (not counting infoboxes).

Or even rules like these:

  • They have fewer than one wikilink per 500 characters.
  • They have no internal wikilinks in the text of the article (not counting infoboxes).

As an output, it would be good to know how many articles in each of our target wikis fit these sorts of rules. In Arabic Wikipedia, we would also want to know how many do and don't overlap with the articles having this category: https://ar.wikipedia.org/wiki/تصنيف:جميع_مقالات_النهاية_المسدودة

Note: wikis have a "Special:Dead-end pages" page that has some method for automatically listing articles with no outbound links: Special:DeadendPages. However this page is list listing every Dead end pages.

Event Timeline

I'm moving this to Ready for Development, because I think it's likely we'll want this ability at some point, and other newcomer task tickets aren't ready yet.

They are greater than 100 characters.

That is too low IMO. An article that's long enough to spend some time on would be over 1000 bytes in cswiki. While bytes and characters doesn't mean the same, it's still too low I think :).

The most basic heuristic would just be to list those articles that have no internal wikilinks at all.

This part seems pretty straightforward with the pagelinks table. I'll leave this task unclaimed in case @Catrope wants to claim it as the other parts are a little more complicated, otherwise I can come back to it when I'm done with the topic/task selection task.

Moving back to Upcoming Work because this is not necessary for Newcomer Tasks v1.0.

I've looked into articles needing links at cswiki. For that reason, I've used metric "bytes per link", which is length of article divided by number of mainspace links. I've calculated that metric for Czech featured articles (average value is 352 bytes per link, maximum value is 961 bytes per link). Then, I've looked for articles with this metric greater than 1000 bytes per link. The list of such 50 articles is at https://cs.wikipedia.org/wiki/Wikipedista:Martin_Urbanec_(WMF)/%C4%8Cl%C3%A1nky_s_m%C3%A1lo_odkazy. Many of those articles indeed require more links to be added. I'm going to review the list and add the links template to them if links are needed. Hopefully, this would increase number of available link tasks.

Used queries:

Calculate average bytes per link in featured articles

select avg(bytes_per_link) from (select page_len/count(*) as bytes_per_link from pagelinks join page on pl_from=page_id where pl_from in (select cl_from from categorylinks join page on cl_fro m=page_id where cl_to="Wikipedie:Nejlepší_články") and page_namespace=0 and pl_namespace=0 group by pl_from) as tmp;

Find articles with bytes per link greater than X

select page_title, page_len/count(*) as bytes_per_link from pagelinks join page on page_id=pl_from where page_len>3000 and page_namespace=0 and page_title not like "Seznam_%" group by page_id having bytes_per_link>1000 limit 50;
Urbanecm edited projects, added Growth-Team (Current Sprint); removed Growth-Team.
Urbanecm added a subscriber: revi.

As I'm now officially working on this :-). I've created https://tools.wmflabs.org/articles-needing-links/, which suggests articles needing links to experienced editors, to make it easier to find them and add a maintenance template for that. It currently works for cswiki, kowiki and testwiki.

Quoting my message from internal chat:

Articles Needing Links is here

Hello, after discussion with @MMiller_WMF and @Trizek-WMF in my last check-in meeting, we decided to create a tool that can suggest articles with little number of links, so wikis with little number of underlinked template transclusions can boost suggested edits a little.

The tool is now available for beta testing at https://tools.wmflabs.org/articles-needing-links/. It is possible to use it at testwiki, cswiki and kowiki. It should be possible to get it running for any other project, it is just necessary to tell the algorithm what is the approximate density of links in featured articles of that wiki.

When an article is marked as needing links, the tool adds underlinked template to the article. If it is marked as "links are okay", it is deleted from the queue of suggested articles (it can re-appear in the queue when it is re-populated, I'd need to fix that).

@MMiller_WMF I've imported few articles from cswiki to testwiki, so you (or others) can try e (tool out without touching a production wiki. Because I've faked the queue for testwiki, the probability is incorrect, and always set to 50 % at testwiki. For production wikis, the number should be more meaningful.

@revi The algorithm's input is currently the same for cswiki and kowiki. I can customize it for kowiki, I'd need to know the category containing your featured articles. Also, for the tool to not add {{underlinked}} to articles with little links, I'd need to know the template's name in kowiki, as well as translation of "Add {{underlinked}} template".

To anyone wondering: The source code is in Gerrit as labs/tools/articles-needing-links.

Let me know what you think!

I am going to resolve this for now, on the grounds that we are now approaching this challenge with the "add a link" structured task and its link recommendation model.