Page MenuHomePhabricator

Restore previous (January) reports on WikiStats
Closed, ResolvedPublic

Description

Wikisource statistics are clearly wrong, as if most dumps had not been processed (compare).

Erik, I thought you had stopped updates: were the scripts run by you? If so thank you and I'm sure you can fix it. :-) If someone else did, however, we'd better restore any available backup of the HTML directories as soon as possible to avoid giving misleading information.

The footer says:

Generated on Tuesday February 14, 2017 13:47 from recent database dump files.
Data processed up to Tuesday January 31, 2017

According to https://stats.wikimedia.org/WikiCountsJobProgress.html, the reports have been updated in the last 4 days; Wikipedia in the last 4 hours.

https://stats.wikimedia.org/WikiCountsJobProgressCurrent.html says

Progress in last week

at end of day still to do:
2017-02-12 wb:120/120 wk:171/171 wn:32/32 wo:17/17 wp:286/286 wq:89/89 ws:64/64 wv:14/14 wx:8/8
2017-02-13 wk:7/171 wn:32/32 wo:17/17 wp:56/286 wq:89/89 ws:64/64 wv:14/14 wx:8/8
2017-02-14 wp:19/286 wv:14/14 wx:8/8
2017-02-15 wp:11/286 wx:2/8
2017-02-16 wp:6/286 wx:2/8
2017-02-17 wp:3/286 wx:1/8
2017-02-18 wp:1/286

Event Timeline

Restricted Application added projects: Internet-Archive, Analytics. · View Herald TranscriptFeb 19 2017, 10:26 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Ankry added a subscriber: Ankry.Feb 19 2017, 11:19 AM

Same problem concerns other wiki stats, like for Wikipedia: https://stats.wikimedia.org/EN/Sitemap.htm or Wiktionary: https://stats.wikimedia.org/wiktionary/EN/Sitemap.htm ; however the problem is not so clearly visible there.
At the moment statistics provide false information that many active wikis encountered less that 10 edits in January, including:

  • zhwiki
  • ukwiki
  • arwiki
  • bnwiki
  • enwiktionary
  • itwikisource
  • plwikisource

and many others.
This information is misleading, suggesting that activity dropped significantly there recently.

Also, for some wikis links leads to older stats (December 2016), eg:

  • enwikisource
  • frwikisource
  • dewikisource
  • ruwikisource
  • enwiki
  • dewiki
  • ruwiki
  • eswiki
  • plwiki

(It seems likely that stats were generated when most of February dumps were not ready yet)

Ankry triaged this task as Unbreak Now! priority.Feb 19 2017, 11:22 AM
Ankry added a subscriber: Zdzislaw.
Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald TranscriptFeb 19 2017, 11:22 AM

@Nemo_bis, I'm still running Wikistats monthly till Wikistats 2.0 is taking over. As you know the later is in the UI design phase.

@Ankry, you're right: (It seems likely that stats were generated when most of February dumps were not ready yet)
That's indeed what happened, with unexpected consequences.
I updated the banner to point to new UI design page [1]. Expecting to see a mix of December and January stats. But I didn't notice that so wikis many were flagged as 'less than 10 edits'.

Right now all January dumps have been processed except for English Wikipedia (running). So let me start with rebuilding the reports, and check results.

There is one change in the scripts effective this month:
From this month edits on redirect pages will also count for per user activity, and thus up the active users stats. As always this will affect all history, so MoM and YoY comparisons remains legit. As this will be the rule for Wikistats 2.0, updating Wikistats 1.0 helps comparison between old and stats.

[1] https://www.mediawiki.org/wiki/Wikistats_2.0_Design_Project/RequestforFeedback/Round1

I found two anomalies so far, but can't reproduce the problem.

Anomaly 1) Possibly not related:
On every run obsolete temporary directories are removed from ../dumps/tmp/ folder
but only if file '@Ready' exists in that folder, signaling that WikiCounts.pl step has completed normally.
Log shows ~800 folders for ~800 wikis still existed in ../dumps/tmp/ without file '@Ready',
thus after WikiCounts.pl step ended abnormally.

Anomaly 2) Surely related:
For missing languages file StatisticsMonthly.csv contains corrupt lines from a certain month, with article count 0.
This triggers reporting step to omit those languages from sitemap page.
(BTW results of previous run are still accessible, just not linked from sitemap page)

Both the old and new version of WikiCountsInput.pm (the one module that was changed recently) produce normal results.
So I chose to rerun WikiCounts.pl for those missing languages, which may take a two or three days.
Fortunately the stats for really large Wikipedias have not been corrupted.

As for restoring previous html:
I only backup English language html files once a week, which is already 0.5 GB.
So I could restore the English version and deactivate the other languages, but only sitemap pages are really faulty.
So I could restore only the 9 English sitemap pages.
But instead I will run a new reporting step for Wikipedia soonish (less than a day), and see if the list of missing languages has shrunk, if this happens (as expected) I will continue to repair the remainder.

Milimetric moved this task from Incoming to Radar on the Analytics board.Feb 23 2017, 4:45 PM
ezachte closed this task as Resolved.Feb 26 2017, 7:28 PM
ezachte claimed this task.

Stats for missing wikis were regenerated over the past days. Each days Wikipedia reports were refreshed. Yesterday the overhaul was completed.