Page MenuHomePhabricator

Statistic page not entirely correct
Open, LowPublic

Description

Author: a1

Description:
Could anybody explain me, do Ukrainian Wikipedia reach 10 M edits or not? According to http://uk.wikipedia.org/wiki/%D0%A1%D0%BF%D0%B5%D1%86%D1%96%D0%B0%D0%BB%D1%8C%D0%BD%D0%B0:%D0%A1%D1%82%D0%B0%D1%82%D0%B8%D1%81%D1%82%D0%B8%D0%BA%D0%B0 - yes. But according to https://uk.wikipedia.org/w/index.php?oldid=10000000 - not yet (only 9,82 M). Which statistic is true and which is false? I think there is no reason to have true and false statistic at the same time. The true one is preferable.


Version: 1.23.0
Severity: normal

Details

Reference
bz38085

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:49 AM
bzimport set Reference to bz38085.
bzimport added a subscriber: Unknown Object (MLST).

a1 wrote:

  • This bug has been confirmed by popular vote. ***

(In reply to comment #0)

Could anybody explain me, do Ukrainian Wikipedia reach 10 M edits or not?
According to
http://uk.wikipedia.org/wiki/%D0%A1%D0%BF%D0%B5%D1%86%D1%96%D0%B0%D0%BB%D1%8C%D0%BD%D0%B0:%D0%A1%D1%82%D0%B0%D1%82%D0%B8%D1%81%D1%82%D0%B8%D0%BA%D0%B0

not yet (only 9,82 M). Which statistic is true and which is false? I think
there is no reason to have true and false statistic at the same time. The true
one is preferable.

This is a long-known issue with Special:Statistics. They're not perfectly accurate, and never have been. The lower number is the correct number here.

This is probably a duplicate bug.

a1 wrote:

Thank you for reply. Then what's the reason for displaying incorrect numbers in Statistic page? The correct one would be preferable, isn't it?

(In reply to comment #3)

Thank you for reply. Then what's the reason for displaying incorrect numbers in
Statistic page? The correct one would be preferable, isn't it?

Yes, but the number isn't easy to get necessarily.

For large tables (like revision), it's inefficient to do COUNT(*) to get that number. So we estimate.

Other counts like "number of pages" are handled via the site_stats table, which is subject to inaccuracies for other reasons.

Well, but why we cannot calculate it by different why?

For example as SELECT MAX(cur_id) FROM cur. It will be more accurate value than COUNT(*) FROM revision.

Maybe we could run initStats.php --active --nowviews --update on that wiki?

a1 wrote:

If this could fix a bug, maybe

On http://abj.jidanni.org/index.php?title=Special:Statistics
which uses super fresh up to date Mediawiki,
we note
Content pages 14
Pages
(All pages in the wiki, including talk pages, redirects, etc.) 21

Now we click
http://abj.jidanni.org/index.php?title=Special:AllPages
which is linked to the words "Content pages",
and lo and behold, there are three columns of 22 items each.
Actually the first column is 23 items.
That makes 23+22+22, which is very much more than 14.
This is a blatant bug, no?

(In reply to comment #5)

Well, but why we cannot calculate it by different why?

For example as SELECT MAX(cur_id) FROM cur. It will be more accurate value
than
COUNT(*) FROM revision.

cur is no longer the name of that table, but for your general point, that would include pages that have since been deleted since id numbers do not generally get reused.

(In reply to comment #8)

On http://abj.jidanni.org/index.php?title=Special:Statistics
which uses super fresh up to date Mediawiki,
we note
Content pages 14
Pages
(All pages in the wiki, including talk pages, redirects, etc.) 21

Now we click
http://abj.jidanni.org/index.php?title=Special:AllPages
which is linked to the words "Content pages",
and lo and behold, there are three columns of 22 items each.
Actually the first column is 23 items.
That makes 23+22+22, which is very much more than 14.
This is a blatant bug, no?

Content pages are only pages with links on them in main namespace (See [[Manual:$wgArticleCountMethod]]). Perhaps some of your pages do not meet that definition of a content page.

Well,

  1. No user could have ever guessed such a definition,

therefore please add a mouseover, explaining such definition.

  1. A speech by Abraham Lincoln might not contain any links, but will

still contain content, whereas a spamfarm page might be totally links,
but most would agree devoid of content. Therefore the wording is
misleading.

  1. Nowhere else on the statistics page do we find anything close to 67!

Therefore the most important statistic is not presented.

P.S., I clicked [[Manual:$wgArticleCountMethod]] and it said no such page, and suggested "Did you mean: Manual:$articlecountupdate" which upon clicking doesn't exist either. (Bug: it shouldn't suggest things that it knows don't exist.)

Sorry, I meant [[mw:Manual:$wgArticleCountMethod]].

The definition is meant to exclude stubs from wikipedia (Like most things MediaWiki, especially features that go way back, its a bit wikipedia centric).

I would be fine with having explanatory text (I would suggest just having it small under the words Content pages). File a separate bug for that.


  1. Nowhere else on the statistics page do we find anything close to 67!

Therefore the most important statistic is not presented.

You're right, and that's obviously wrong. I notice some of those pages date back from 2005 - hard to know if its some old bug or what. It should also be noted that some import scripts like the java dump importer (probably) don't update the count properly. Or perhaps there are other bugs with the statistics counter.

Instead of me reporting more bugs, the whole statistics page needs to be rethought from the point of view of the man in the street.

Likely nobody will ever work on that part if it's hidden in a comment in some bug report, instead of a clear and separate bug report.