Page MenuHomePhabricator

napwikisource reports more content pages than all pages in its main namespace
Open, Stalled, Needs TriagePublic

Description

Our newest content wiki, the Neapolitan Wikisource, just (presumably) had its stats recounted on August 1st, for the first time since it was created and its content was imported. The "content pages" count dropped from 10,167 the previous day to 413. The reason I am opening this task is not the drop itself (which is large but not unprecedented for new wikis) but the count that it dropped to.

413 content pages is 230 pages higher than the total number of non-redirects in the main namspace (183).

(The main namespace is the wiki's only content namespace and the wiki uses the default 'link' counting method.)

How is this possible? How could it be fixed? Am I missing something here that makes this actually not an error?

Event Timeline

Dcljr created this task.Mon, Sep 2, 5:54 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMon, Sep 2, 5:54 AM
Dcljr updated the task description. (Show Details)Mon, Sep 2, 5:55 AM
Dcljr updated the task description. (Show Details)

Not sure which exact project tags to set here, but for some general information:

  • updateArticleCount.php runs on the 1st and 15th of a month. Hence numbers were updated.
  • updateArticleCount.php might have bugs, see for example T212706.
  • Note that the default for wgArticleCountMethod is set to link in InitialiseSettings.php.txt, so a page will only be counted if it contains a wiki link.
Urbanecm claimed this task.Tue, Sep 3, 4:40 AM
Urbanecm added a subscriber: Urbanecm.

I'd treat the script tag as de-facto subproject of the site request one, anyway, either is fine IMO, people willing to look into this should watch both.

Claiming, will have a look soon.

Restricted Application added a project: User-Urbanecm. · View Herald TranscriptTue, Sep 3, 4:40 AM
  • Note that the default for wgArticleCountMethod is set to link in InitialiseSettings.php.txt, so a page will only be counted if it contains a wiki link.

We had this problem on srwikisource and this was resolved with doing this, @Aklapper is right. But @Urbanecm claimed this task, so I want to he say what is better.

Dcljr added a comment.Fri, Sep 6, 1:08 AM

We had this problem on srwikisource and this was resolved with doing this, @Aklapper is right.

@Zoranzoki21 : Umm… doing what? Aklapper's comment didn't contain any suggestion of something to "do".

Dcljr added a comment.Fri, Sep 6, 1:17 AM

BTW, I already acknowledged the 'link' counting method was being used in the task description. My point about "non-redirects in the main namespace" is that the count given by 'link' should not be higher than the highest possible count 'any' could give, which in this case is (or was) 183.

(Hey, that would be an interesting test: change to 'any', recount, note the number, change back to 'link', recount again. But no one's actually going to do that… [grin])

Dcljr added a comment.Fri, Sep 6, 1:30 AM

OBTW (again), I guess I should have also pointed out that the total number of pages in the main namespace (including redirects and the Main Page) is now 206 (would have been 205 on Sep 2).

Not that this number would be given by any counting method (it wouldn't), but just for the sake of completeness…

Urbanecm removed Urbanecm as the assignee of this task.Sat, Sep 7, 4:56 PM
  • Note that the default for wgArticleCountMethod is set to link in InitialiseSettings.php.txt, so a page will only be counted if it contains a wiki link.

We had this problem on srwikisource and this was resolved with doing this, @Aklapper is right. But @Urbanecm claimed this task, so I want to he say what is better.

i just looked if script is a solution, and forgot to update

  • Note that the default for wgArticleCountMethod is set to link in InitialiseSettings.php.txt, so a page will only be counted if it contains a wiki link.

We had this problem on srwikisource and this was resolved with doing this, @Aklapper is right. But @Urbanecm claimed this task, so I want to he say what is better.

i just looked if script is a solution, and forgot to update

Ok, so we should set wgArticleCountMethod to any and I will claim this task.

Restricted Application added a project: User-Zoranzoki21. · View Herald TranscriptSat, Sep 7, 7:25 PM
Dcljr added a comment.Sun, Sep 8, 12:20 AM

Ok, so we should set wgArticleCountMethod to any and I will claim this task.

Whoa, hang on… is this what the napwikisource community wants?

Urbanecm changed the task status from Open to Stalled.Sun, Sep 8, 1:16 AM
Urbanecm moved this task from Backlog to Config - to process on the Wikimedia-Site-requests board.

Good question. Zoranzoki, can you ask at their village pump to confirm this request?

Good question. Zoranzoki, can you ask at their village pump to confirm this request?

Zoran or Kizule, what you like more to call me... :)

I asked at their village pump, question is here.

Dcljr added a comment.Mon, Sep 9, 3:20 AM

OK, it's fine to seek consensus about this change, but why is this the proposed solution to the reported problem? This is not trying to fix the underlying issue (whatever it may be), just trying to avoid it — and doing so in a way that presumably will not be acceptable to at least some wikis that may encounter the same problem in the future (so this cannot be used as a general workaround).

BTW, even if this does gain consensus at napwikisource, I suggest not acting on it until the wiki is recounted again. @Urbanecm, did you actually do this? Your comment above isn't completely clear about this. (Would be nice to see the results, too.)

Is there any other maintenance script that might affect article counting that could (also) be tried? Like rebuilding/repopulating/whatever the pagelinks database for the wiki? (I don't know…)

Dcljr added a comment.Mon, Sep 9, 11:34 AM

…Or am I just misunderstanding and this (@Zoranzoki21's suggestion) is not being proposed as a "permanent fix" to the problem?

…Or am I just misunderstanding and this (@Zoranzoki21's suggestion) is not being proposed as a "permanent fix" to the problem?

@Dcljr My suggestion is related to (if possible) permanent fix of the problem.

Dcljr added a comment.Mon, Sep 9, 11:54 AM

@Zoranzoki21 OK, then I have to ask (again): Why?

This seems to be trying to fix a completely different problem: namely, an otherwise correct content-page count that "seems too low" to the wiki community. That is not the issue we have here. (Granted, it may become that now that the community is discussing the proposed change… [grin])

I would like to see some attempts to actually diagnose and fix the problem (of having an impossibly high reported content-page count) before we just "paper over" the problem (by changing counting methods) and ignore it.

For example, does anyone recall (or can anyone find) a similar report for another Wikimedia wiki (not "too low" of a count, which is reported "all the time", but one that's "impossibly high")? I can't recall ever seeing such a report that wasn't based on (1) a misunderstanding of how MediaWiki works or how the wiki in question was configured, or (2) a temporary condition for a new wiki that was fixed the next time it was recounted.

The number should be updated two times a month, if the number doesn't expect community expectations, we should ask what is needed, so we can change the config to do what it is expected to.

Dcljr added a comment.Tue, Sep 10, 1:00 AM

Are people actually reading this thread, or are they just skimming and trying to get the gist of what is being said?

This is not a "what configuration do you want" issue; this is a counting "bug". (Maybe it should be marked as such?)

The count is not simply "not meeting community expectations" (to paraphrase Urbanecm). This is an error in the content-page count: the current count cannot possibly be correct, regardless of what the community thinks or wants. (Thus, IMO, the community should not have been consulted until sysadmins/developers tried to figure out what was actually causing the problem. As I suggested above, they might request a change that would "hide" the problem, rendering it moot, but they cannot fix the problem with a configuration change request.)

If "we don't care" what is causing the problem, well, I guess I have nothing more to say about this. But I kinda figure it would be nice to know why this has happened, so it could be avoided or fixed legitimately in the future, if it happens again on a new wiki (or any wiki).

Finally, again I ask @Urbanecm: when you say "i just looked if script is a solution, and forgot to update", does this mean you actually ran a maintenance script? If so, which one (updateArticleCount, initSiteStats.php, or something else) and what was the result?