Page MenuHomePhabricator

pageviews files missing since yesterday 1st December
Closed, ResolvedPublic

Description

Currently, we consume the pageviews files that you generate every day as an input of our Wikipedia infrastructure.

https://dumps.wikimedia.org/other/pageviews/2016/2016-12/

We have noticed that there are no new files since yesterday, 1st December at 2200, and because of that we are having issues in our side.

I would like to know if this issue is already noticed in your side and if there is a current action to fix the generation of this files.
Thanks

Event Timeline

ariel@stat1002:~$ ls -lt /mnt/hdfs/wmf/data/archive/pageview/legacy/hourly/2016/2016-12 | head
total 1145803
-rwxrwxrwt 1 hdfs hdfs 55665150 Dec 1 23:16 pageviews-20161201-220000.gz
-rwxrwxrwt 1 hdfs hdfs 57487624 Dec 1 22:14 pageviews-20161201-210000.gz
-rwxrwxrwt 1 hdfs hdfs 58880082 Dec 1 21:24 pageviews-20161201-200000.gz
-rwxrwxrwt 1 hdfs hdfs 59713800 Dec 1 20:27 pageviews-20161201-190000.gz

Adding Analytics to look at this.

Known issue - Some jobs have been failing yesterday night, we are currently rerunning them.
Data should flow-in today.

Closing task, please re-open it if data will not be there during the next 24 hours. Sorry for the trouble and thanks for the report!

Hello, we see the same issue staring tonight around 1 AM. Is this being looked at?

Thanks for the ping, we are taking care of the issue as we speak :)

https://gerrit.wikimedia.org/r/#/c/368383 should fix the issue, it will take a bit of time but the new files will be created soon.

Please open a new ticket in the future if the old one has been resolved for so long, as likely the issue will be different.

It seems that stat1002 is currently unreachable; since the new stat1005 host is taking over its duties, I have altered all the cron jobs to rsync from there to our public-facing webserver. Files should start showing up during the next hour.