Page MenuHomePhabricator

Newcomer tasks: investigate ability to identify articles with no outbound links
Open, Needs TriagePublic

Description

In T229430, we looked at which maintenance templates are available in our target wikis and how many articles are tagged with them. We have a couple concerns:

  • Although thousands of articles have maintenance templates, we're concerned that once narrowing to topics of interest, there won't be enough articles for newcomers to work on.
  • The target wikis don't all have the same templates. For instance, while they all have some copy edit templates, only Arabic uses a template to tag articles that need more outgoing links.

Because of those concerns, it is worth investigating our ability to supplement the maintenance templates. The one that is highest priority to investigate is the ability to detect which articles need more outgoing links, because we believe that is one of the best tasks for newcomers. This is the Arabic category storing articles that Arabic editors judge have this condition: https://ar.wikipedia.org/wiki/تصنيف:جميع_مقالات_النهاية_المسدودة

The most basic heuristic would just be to list those articles that have no internal wikilinks at all.

A more sophisticated approach might have rules like these:

  • They are greater than 100 characters.
  • They have no internal wikilinks in the text of the article (not counting infoboxes).

Or even rules like these:

  • They have fewer than one wikilink per 500 characters.
  • They have no internal wikilinks in the text of the article (not counting infoboxes).

As an output, it would be good to know how many articles in each of our target wikis fit these sorts of rules. In Arabic Wikipedia, we would also want to know how many do and don't overlap with the articles having this category: https://ar.wikipedia.org/wiki/تصنيف:جميع_مقالات_النهاية_المسدودة

Event Timeline

I'm moving this to Ready for Development, because I think it's likely we'll want this ability at some point, and other newcomer task tickets aren't ready yet.

MMiller_WMF updated the task description. (Show Details)Aug 29 2019, 12:37 AM

They are greater than 100 characters.

That is too low IMO. An article that's long enough to spend some time on would be over 1000 bytes in cswiki. While bytes and characters doesn't mean the same, it's still too low I think :).

The most basic heuristic would just be to list those articles that have no internal wikilinks at all.

This part seems pretty straightforward with the pagelinks table. I'll leave this task unclaimed in case @Catrope wants to claim it as the other parts are a little more complicated, otherwise I can come back to it when I'm done with the topic/task selection task.