Page MenuHomePhabricator

Wiktionary needs an option to include all in the article count
Closed, ResolvedPublic

Description

Author: lars

Description:
According to http://www.mediawiki.org/wiki/Manual:Article_count
the article count can either be link count or comma count.

Wiktionary (in English, Swedish and some other languages) needs to
count all articles in the main namespace that aren't redirects.
To this end, bots add a fake link to all pages, so they should be
included in the link count. This is of course highly inefficient.

MediaWiki needs another option that counts all articles,
and this option should be activated on Wiktionary.


Version: unspecified
Severity: enhancement

Details

Reference
bz26033

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 11:15 PM
bzimport set Reference to bz26033.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 8214
add new variable $wgArticleCountMethod to support adding any non-redirect in content ns as a good article.

Here's a patch that: adds a new variable $wgArticleCountMethod to replace $wgCommaCount, which supports 3 methods: comma, link, or any.

However, we currently don't have any maintenance scripts to rebuild the article count. Current maintenance scripts sort of do something thats not quite similar:

*maintenance/updateArticleCount.php Considers pages as good if they are in a content namespace, not a redirect, and contain an outgoing internal link to some page. This differs from Article::isCountable in that interwiki links, or just a plain <nowiki>[[</nowiki> is not counted, where it is in Article::isCountable.
*maintenance/initStats.php (And friends in includes/SiteStats.php) Consider a page to be a "good" article if it is in a content namespace, not a redirect, and has a length greater than 0 (can pages even have a zero length?). This is quite different from the article::isCountable definition.

The differences are bad since this count won't fix itself with time. 1 is added to the count if the previous version is not good under the current definition, and the next version is good.

For example say a page contains just 'foo[[bar]]'. You start out with the count using [[. So this is counted as a good article. You have 1 good article (assuming this is only article). You switch to using comma count, and make a null edit to this page. You still have 1 good article, because mediawiki detects that both the current and the previous version of 'foo[[bar]]' is not good as it has no comma. So its impossible for the count to get back on track, as its only decremented if a comma is removed (When useing comma count method), even if there were no commas whatsoever in the wiki.

It would perhaps solve everyone's problems if we ditched the whole page link thing, and just counted articles that are content ns, not redirect, and have a certain size. (or if we're insistent on page links, use actual links, instead of just looking for [[).

Attached:

(In reply to comment #1)

However, we currently don't have any maintenance scripts to rebuild the article
count. Current maintenance scripts sort of do something thats not quite
similar:

*maintenance/updateArticleCount.php Considers pages as good if they are in a
content namespace, not a redirect, and contain an outgoing internal link to
some page. This differs from Article::isCountable in that interwiki links, or
just a plain <nowiki>[[</nowiki> is not counted, where it is in
Article::isCountable.
*maintenance/initStats.php (And friends in includes/SiteStats.php) Consider a
page to be a "good" article if it is in a content namespace, not a redirect,
and has a length greater than 0 (can pages even have a zero length?). This is
quite different from the article::isCountable definition.

The differences are bad since this count won't fix itself with time. 1 is added
to the count if the previous version is not good under the current definition,
and the next version is good.

For example say a page contains just 'foo[[bar]]'. You start out with the count
using [[. So this is counted as a good article. You have 1 good article
(assuming this is only article). You switch to using comma count, and make a
null edit to this page. You still have 1 good article, because mediawiki
detects that both the current and the previous version of 'foo[[bar]]' is not
good as it has no comma. So its impossible for the count to get back on track,
as its only decremented if a comma is removed (When useing comma count method),
even if there were no commas whatsoever in the wiki.

This wouldn't cause any problems for wikis that already cause all pages to be counted by inserting fake links into all pages, would it?

This wouldn't cause any problems for wikis that already cause all pages to be
counted by inserting fake links into all pages, would it?

Not unless the refresh script was run on them (which is very unlikely to happen unless someone explicitly asks that it is run).

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*