Page MenuHomePhabricator

Confusing large differences in number of "Content pages" on thwikisource in 2019 (between 7600 and 12500 pages)
Open, Needs TriagePublic

Description

Dear Phabricator team,

User:Dcljr told me about "Content-page count weirdness" on Thai Wikisource.

Reference: https://th.wikisource.org/w/index.php?title=คุยกับผู้ใช้:B20180&diff=prev&oldid=99718

Seem this project have problem about article-count in the last few weeks on https://th.wikisource.org/wiki/พิเศษ:สถิติ

Has anyone can solve this problem to be stable again?

Thank you for your assistance regarding this matter,

User:B20180

Event Timeline

@B20180: Is this a bug that the general (not Thai only) All-and-every-Wikisource community themselves plan to work on, and unrelated to MediaWiki or Wikimedia server configuration? If not then you will have to add relevant project tags so this task can be found by folks outside of the general (not Thai only) All-and-every-Wikisource community. :)

For some context: https://www.mediawiki.org/wiki/Help:Magic_words links to https://www.mediawiki.org/wiki/Manual:$wgArticleCountMethod and the default is that "the page must contain a wikilink to be considered valid" plus there are bugs such as https://phabricator.wikimedia.org/T212706
The maintenance script to update numbers is run twice a month: Beginning of month and 15th of month. Hence a change in numbers on October 1st.

@B20180: I don't understand your last action here (assignment), can you please elaborate?

@B20180: Feel free to follow https://www.mediawiki.org/wiki/How_to_report_a_bug and provide 1) clear steps to reproduce including links, 2) what exactly you expect, 3) what you see instead.

@B20180: I don't understand your last action here (assignment), can you please elaborate?

@Aklapper: Sorry. I think that he can help me. But he told me on Thai Wikisource that he can't.

Aklapper changed the task status from Open to Stalled.Nov 1 2019, 7:06 AM

@B20180: Please see and follow T234458#5563773; otherwise this task will get closed. Thanks!

@B20180: Feel free to follow https://www.mediawiki.org/wiki/How_to_report_a_bug and provide 1) clear steps to reproduce including links, 2) what exactly you expect, 3) what you see instead.

@Aklapper: There are no "steps to reproduce" because we are talking about historical article counts here. And the "weirdness" being reported is "weird" precisely because it cannot be explained given knowledge of what constitutes an article, what kinds of edits cause article count changes, and the fact that the wiki is periodically recounted.

But it is also true that there is no "actionable" request here, so you can close this task as "invalid" on that basis, if you'd like.


That being said… since this task exists, I will give a short summary of the issue here, Just For The Record:

Here are the article-count "milestone" changes (i.e., passing or dropping below a milestone level reported at m:Wikimedia News) seen in the Thai Wikisource since February 2012, along with presumed explanations for the changes (each of which occurred over an approximately 24-hour period, based on the daily collection of article counts through the API):

  • 2012-05-10: 13,599 - 8,548 = 5,051 (-63%) [all Wikisources recounted, most for the first time ever]
  • 2019-05-08: 9,206 + 810 = 10,016 (+9%) [a user added copyright templates to many pages]
  • 2019-08-09: 12,628 - 4,989 = 7,639 (-40%) [unknown -- not a regularly scheduled recount, but maybe an off-cycle one??]
  • 2019-09-15: 7,755 + 4,748 = 12,503 (+61%) [regular recount, but what actually caused the change?]
  • 2019-10-01: 12,503 - 4,771 = 7,732 (-38%) [regular recount, but what actually caused the change?]

The first two don't need any further explanation, but the last three do: they should be explainable based on observed editing activity on the wiki, but I could not figure out what was causing the huge swings, looking at Special:RecentChanges, Special:NewPages, Special:Log/import, or Special:Log/delete. In each case, there didn't appear to be any mass page deletions, creations, moves, or imports in the previous days (RecentChanges) or weeks (the others) of sufficient magnitude to match the lagre changes; no mass edits of any kind to thousands of pages; and no relevant edits to widely used templates, either (i.e., to introduce wikilinks or remove them).

Note, BTW, that the current (as I type this) article count on the wiki is 7,674, almost the same as on Aug 9th and Oct 1st, which is consistent with the relatively low-traffic nature of the wiki (<100 changes per day), normally.

@Aklapper: There are no "steps to reproduce" because we are talking about historical article counts here.

@Dcljr: There must be steps to reproduce, as you must have realized somehow, somewhere, that there might be a problem.
How and where to see something? https://th.wikisource.org/wiki/พิเศษ:สถิติ ? Somewhere else?

Thanks for posting some numbers, but currently nobody else knows how you got these numbers, or how to reproduce these numbers.
Again, please see https://www.mediawiki.org/wiki/How_to_report_a_bug - thanks.

@Aklapper This is not going to satisfy your request, but I'll say it anyway.

As I mentioned in my previous post (in a lot less detail), the counts reported above are based on API "statistics" queries run daily using a personal script on my own computer (which gives the same information displayed on the wiki's Special:Statistics page, if you looked at it at the right times in the past).

While those exact same counts cannot be verified by anyone else, very similar results can be gleaned from the page history of m:Wikisource/Table, which is based on the same kind of API queries, but at different times.

I am not going to dig through and find specific relevant diffs to link to, however, because they won't mean anything unless they can be correlated with information from Special:RecentChanges, Special:NewPages, Special:Log/import, and Special:Log/delete for the same time periods (which I did at the time these weird changes happened, as I mentioned above).

Since such correlating is no longer possible now that months have passed since the changes happened (only Special:Log/delete is still informative at this point), I don't see much use in taking any more time on this task.

Like I said, you can close it on the basis of "nothing to do" or "not enough information provided". Whatever.

Dcljr renamed this task from Content-page count weirdness to Content-page count weirdness observed on thwikisource.Dec 1 2019, 6:41 AM
Aklapper renamed this task from Content-page count weirdness observed on thwikisource to Confusing large differences in number of "Content pages" on thwikisource in 2019 (between 7600 and 12500 pages).Jan 7 2020, 9:11 PM
Aklapper changed the task status from Stalled to Open.
Aklapper updated the task description. (Show Details)

Just throwing this out there: perhaps refreshLinks.php should be run on thwikisource (followed by InitSiteStats.php), just to make sure there are no inconsistencies in internal linking records that might have some effect on what is being discussed in this task. (I don't think it would fix the problem, but "it couldn't hurt", right?)

Adding relevant tag so someone else can make a decision about this. (Apologies if this is an inappropriate use of the tag.)