Page MenuHomePhabricator

Special:Statistics gives impossibly large/small counts due to underflow
Closed, ResolvedPublic

Description

Author: CrazyDreamer

Description:
Upon installation, Special:Statistics gave correct information for the number of "probably
legitimate content pages" (0). During the course of customization, I renamed a few of the
namespaces, and (probably more importantly) moved the Main Page to the Project: namespace (here
renamed Meta:) and altered MediaWiki:Mainpage to point there as well. I then deleted the redirect
created by the move. After these customizations, the Special:Statistics page gave the following:

There are 1,279 total pages in the database. This includes "talk" pages, pages about Æ,
minimal "stub" pages, redirects, and others that probably don't qualify as content pages. Excluding
those, there are 18,446,744,073,709,551,615 pages that are probably legitimate content pages.

Obviously I didn't have 18,446,744,073,709,551,615 pages to be legitimate content. I created a
page in the primary namespace and checked the statistics again:

There are 1,280 total pages in the database. This includes "talk" pages, pages about Æ,
minimal "stub" pages, redirects, and others that probably don't qualify as content pages. Excluding
those, there are 0 pages that are probably legitimate content pages.

Fine. Now I deleted the page and re-checked; I got the first message again, verbatim. I undeleted
the page and the message did not change. That is to say, after the delete I received . . .

There are 1,279 total pages in the database. This includes "talk" pages, pages about Æ,
minimal "stub" pages, redirects, and others that probably don't qualify as content pages. Excluding
those, there are 18,446,744,073,709,551,615 pages that are probably legitimate content pages.

. . . regardless of an undelete.

I suppose that qualifies as two bugs, actually; let me know if you want me to re-file this report
so that they can be separated.

In case it matters, my PHP and MySQL versions are listed by Special:Version as the following:
PHP: 4.3.2 (apache2filter)
MySQL: 3.23.58


Version: 1.11.x
Severity: minor
URL: http://ash.crazydreams.org/index.php?title=Special:Statistics

Details

Reference
bz4650

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:02 PM
bzimport set Reference to bz4650.
bzimport added a subscriber: Unknown Object (MLST).

The '1279' total pages is correct, as it includes pages in other namespaces,
including the Mediawiki: namespace, which is populated with all the interface
text when the system is installed.

Presumably the very long article account occurs because the code subtracts one
from the total for the 'Main Page' as this does not count as an article. As
there is no main page nor any other articles in the main space, the code
subtracts one from zero, causing the counter to wrap round to the highest
possible number it can hold, hence the huge number that is being displayed.

I would suggest that the software is modified so it doesn't subtract 1 from the
count, as the main page would be considered legitimate content in a lot of
projects, and in the current WM projects where there are hundreds of thousands
of articles, an extra one makes little difference.

robchur wrote:

*** Bug 7378 has been marked as a duplicate of this bug. ***

robchur wrote:

*** Bug 9192 has been marked as a duplicate of this bug. ***

Recommended fixes:
a) Change the field from unsigned to signed
b) Fix the UPDATE to prevent underflows below 0 when the page counters are
decremented.

(Perhaps having it regenerate the stats when an invalid value is detected
wouldn't hurt.)

thomas.dalton wrote:

Doesn't (b) make (a) unnecessary? It should never be negative, so it seems wasteful to have it signed.

ayg wrote:

(a) does not make (b) unnecessary, it would just have to detect overflows instead of underflows.

thomas.dalton wrote:

I said (b) makes (a) unnecessary, not the other way around.

ayg wrote:

Er, right, but you confused me by saying it's wasteful to have it signed. Doesn't that imply you agree with (a) too? (b) does make (a) unnecessary, strictly speaking, but there's no reason not to allow people to have more than 2,000,000,000 articles (see http://www.gaiaonline.com/, with over a billion posts . . . yikes).

thomas.dalton wrote:

I think you're reading (a) backwards as well. It is currently unsigned, Brion is suggesting making it signed.

ayg wrote:

Hah, you're right there too. Never mind, I give up. :P

The sensible thing is probably to add some sanity checking into the SiteStats class lazy initialization.

Currently it checks for an empty or missing row and recounts the data; checking for invalid data (negative counts or impossibly high and thus wraparound counts) could do the same.

The UPDATEs for decrementing could also do a check there, maybe.

robchur wrote:

*** Bug 10600 has been marked as a duplicate of this bug. ***

ayg wrote:

(Let's keep the summary human-readable, here.)

dario wrote:

I don't think the bug 10600 is the same mentioned in the summary. Where does the number -1396 (the number of images) in es.wikipedia.org come from?

ayg wrote:

Underflow. It's the same issue.

dario wrote:

The number of images starts at zero. It increments when an image is uploaded and it decrements when it is erased, so that number cannot be less than zero. Why there is an underflow down to -1396 images?

ayg wrote:

Beats me, but who cares? It should be caught at display time even if you manually alter the database. Fixed in r24176.

thomas.dalton wrote:

Sanity checking is great, but we should try and keep things sane in the first place.

ayg wrote:

Reasonable, but I don't see any paths in the code that could lead to image creation without triggering site_stats updates, at a superficial look. If the problem is discovered, well and good, but until then it's best to just make sure the stats get regenerated occasionally (which I'm not sure we do, but if we don't we probably should).

  • Bug 11220 has been marked as a duplicate of this bug. ***