Page MenuHomePhabricator

Wikistats under counted redirects on non English wikis since January (and hence over counted normal articles)
Closed, DeclinedPublic

Description

Starting in January 2015 Wikistats gradually lost all non English translations for #REDIRECT tag.
This didn't happen overnight, instead it took at least two months in which on more and more wikis non English tags disappeared from LanguageCodes.csv

Every run Wikistats parses the php language files in stat1002:/a/mediawiki/core/languages/messages/MessagesXX.php (where XX is language code).
For some reason this parse returned no results on more and more messages files (probably triggered by an update to the message file, which would explain the gradual increase on wikis affected).

Wikistats should have kept the existing tags, but instead reverted to defaults. For example for Italian, where LanguageCodes.csv previously contained
"it,(utf-8),Categoria,Media|File|Immagine|Image,Utente,#RINVIA|#RINVIO|#RIMANDO|#REDIRECT"
it now contained only default
"it,(utf-8),(Category),(Image)|Image|File,(User),(#Redirect)"

Thus for all non English wikis where this change occurred, and where a full archive dump was processed, the number of pages counted as redirects dropped sharply, and instead all pages with an internationalized redirect tag were counted as normal articles.
For those cases where Wikistats processes a stub dump there were no consequences, as Wikistats relies then on the redirect flag in the stub dump instead.

For Wikipedias only three monthly counts were based on full archive dumps (handcoded exceptions because of community requests long ago), namely sv:Swedish, jv:Javanese and sw:Swahili.
see e.g. Swedish Wikipedia before bug occurred
https://web.archive.org/web/20141012034438/https://stats.wikimedia.org/EN/TablesWikipediaSV.htm
Aug 2014: 1.9M articles, 1.2M redirects
https://web.archive.org/web/20150319062119/http://stats.wikimedia.org/EN/TablesWikipediaSV.htm
Aug 2014: 3.1M articles, 175k redirects (those with English tag #REDIRECT)
for newests stats see http://stats.wikimedia.org/EN/draft/TablesWikipediaSV.htm
Aug 2014: 1.9M articles, 1.2M redirects

So the issue affected mostly other projects where counts were based by default on full archive dumps (as smaller run times made this still feasible).

See also https://en.wikipedia.org/wiki/User_talk:Erik_Zachte#http:.2F.2Fstats.wikimedia.org.2Fwikivoyage.2FIT.2FTablesWikipediaIT.htm

I will restore LanguageCodes.csv to version of Dec 2014 for all projects and disable parsing of message file for now (tags for most languages are stable, tags for the few new languages can be updated manually). As Wikistats switches to stub dumps for all projects and languages (for unrelated reason) the issue is only relevant for on demand ad hoc runs of full archive dumps, no longer for standard monthly metrics.

Event Timeline

ezachte claimed this task.
ezachte raised the priority of this task from to Medium.
ezachte updated the task description. (Show Details)
ezachte added subscribers: ezachte, Tbayer.

Removing assignee @ezachte as that Phabricator account has been deactivated. (If there are questions, it seems that @erik_zachte could be contacted.)

Ottomata subscribed.

WikiStats 1 is no longer maintained.