Page MenuHomePhabricator

Wikistats doesn't yet know of all content namespaces on Wikisource
Open, NormalPublic

Description

Since I'm editing on Wikisources (<2 years) I look at the statistics and notice that the stats (article Count) for different languages are counted in different ways. For example, for en and fr ws the number of articles comprises pages in the Main namespace (without redirs) and also Page, Author and Index Namespaces (for en ws: approx. 500k in main and 1.1 mln in Author, Page, and Index ) - but for pl, it and others ws, article Count covers only the main namespace (for pl ws only ~100k, excluding ~350k in Page ns).
It took me some time to find the cause in Wikistats code... and:

the problem is that at line 47 of WikiCountsInput.pm file there are list of ws namespaces that are counted. Namespaces 102, 104 and 106 these are Author, Page and Index NS for en and fr ws, but they are not fixed for all sources - for pl ws Author, Page and Index -> 104, 100, and 102, for it ws -> 102, 108 and 110, etc. for others...
Thus, the current pl and it ws Stats results are very understated compared to en and fr. This situation is "unfair" because the most of the Wikisource content is in Page ns.

Could you modify the code of Wikistats to identify ws content namespaces by their canonical name or add the correct numbers for pl and it ws?

Regards,
Z.

Event Timeline

Zdzislaw created this task.Feb 21 2016, 5:56 PM
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptFeb 21 2016, 5:56 PM

@Zdzislaw. Hardcoded extra namespaces for some wikisource projects is older code. The newer way is to follow the API which lists all content namespaces per wiki. Every day I harvest these settings for all wikis.

Here is today's list of countable namespaces, mostly from the API + presets mainly for wikisource, I wonder should we remove the presets? That would require the API to be up date for all wikisource wikis. Is that the case?

Zdzislaw added a comment.EditedFeb 25 2016, 4:23 PM

Here is today's list of countable namespaces, mostly from the API + presets mainly for wikisource, I wonder should we remove the presets? That would require the API to be up date for all wikisource wikis. Is that the case?

@ezachte:
The list

is completely wrong regarding the namespaces of wikisource stats content,. e.g:

  • for en ws ws,en,0|102|104|106 102 -> Author; 104 -> Page; 106 -> Index (ok)
  • for pl ws ws,pl,0|102|104|106 102 -> Index 104 -> Author; 106 -> there is no 106 NS in pl ws (wrong) - it should be ws,pl,0|104|100|102 (Author, Page, Index)
  • for it ws (wrong)...

The valid PAGE and INDEX namespaces values for wikisources can be obtained by API query /w/api.php?action=query&format=json&prop=&list=&meta=proofreadinfo&piprop=namespaces - for pl ws: https://pl.wikisource.org//w/api.php?action=query&format=json&prop=&list=&meta=proofreadinfo&piprop=namespaces:

{"batchcomplete":"","query":{"proofreadnamespaces":{"index":{"id":102},"page":{"id":100}}}}

Page and Index ns values for each ws are also defined in InitialiseSettings.php file.

The Author namespaces values also are not constant for all wikisources and they are defined in InitialiseSettings.php, e.g. for pl ws =104

Please, modify the

file so that the stats data will be collected from the correct Main, Author, Page and Index namespaces value.

Regards,

Z.

it should be ws,pl,0|104|100|102

That's what the content namespaces configuration thinks too: https://pl.wikisource.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces

It seems that meta=siteinfo&siprop=namespaces isn't being fully used yet. There should be no need of "presets" for Wikisource any longer.

Nemo_bis renamed this task from Wikisources Wikistats Inconsistencies to Wikistats doesn't yet know of all content namespaces on Wikisource.Feb 25 2016, 4:54 PM
Nemo_bis added a subscriber: ezachte.

Yes, @Nemo_bis is right!

"presets" values are wrong and should be removed for all ws; it is required to use the API for all wikisource.
Author, Page and Index ns are defined in $wgContentNamespaces variable, so the correct values can be obtained using API -> e.g. https://pl.wikisource.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces: for pl ws "content": "" are 0, 100, 102 and 104 ns (main, page, author and index - ok!!)

Z.

Yes, that's why I asked:

"I wonder should we remove the presets? That would require the API to be up date for all wikisource wikis. Is that the case?"

So apparently the answer is yes. Presets can be done away with. Will do.

Yes, that's why I asked:
"I wonder should we remove the presets? That would require the API to be up date for all wikisource wikis. Is that the case?"

Yes, but ... you also wrote:
"Here is today's list of countable namespaces, mostly from the API + presets mainly for wikisource (...)

"
so... it seemed to me that in the there should be the proper NS values (taken from the API) AND wrong (from "presets" values), but... e.g. for pl or it ws, the file contains only "wrong" values ("presets").

So apparently the answer is yes. Presets can be done away with. Will do.

Thank you!

Z.

Nemo_bis triaged this task as Normal priority.Feb 25 2016, 9:31 PM
Ankry added a subscriber: Ankry.Apr 16 2016, 8:07 PM