TL;DR: For the files between 2014-08-24 14:00 and 2014-08-27 21:00
webstatscollector output are showing some irregularities. This affects
both the pagecounts and projectcount files from
https://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-08/
and all services that process them.
The pagecount file 20140824 14:00 [1] did not show
irregularities. But the file for one day later show a drop of 80% for
a few pages.
As gadolinium (the host that is writing the pagecount files) showed a
high and still increasing process count for no longer needed services
(bug 70053), those services got turned off around 2014-08-26 19:00:00.
But although more resources were freed on gadolinium, its
webstatscollector's collector process degraded further. Since the
service did not recover (UDP Receive Buffers filled up again and
again, disks could not take the write load, and the service gathered
95GB of virtual memory since its last restart), the service got
restarted on 2014-08-28 ~15:32.
Since the restart did not relax the situation either, the service was
put on tmpfs 2014-08-28 ~19:48, which reduced load on the disks, and
made the service work again. The files starting at 2014-08-27 21:00
are good again.
Thanks ottomata for all those fixes!
More details are in the corresponding IRC channel logs [3].
Closer investigation of the files between 20140824 14:00 and
2014-08-27 21:00 is still pending.
[1] https://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-08/pagecounts-20140824-140000.gz
[2] https://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-08/pagecounts-20140827-210000.gz
[3] http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics/20140827.txt
Version: unspecified
Severity: normal