Page MenuHomePhabricator

pageviews files missing since yesterday 10th November
Closed, ResolvedPublic

Description

Currently, we consume the pageviews files that you generate every day as an input of our Wikipedia infrastructure.

https://dumps.wikimedia.org/other/pageviews/2016/2016-11/

We have noticed that there are no new files since yesterday at 10th, and because of that we are having issues in our side.

I would like to know if this issue is already noticed in your side and if there is a current action to fix the generation of this files.

Thanks

Event Timeline

ariel@stat1002:/srv$ ls -lt /mnt/hdfs/wmf/data/archive/pageview/legacy/hourly/2016/2016-11 | head
total 11970181
-rwxrwxrwt 1 hdfs hdfs 59780065 Nov 10 20:04 pageviews-20161110-190000.gz
-rwxrwxrwt 1 hdfs hdfs 60463207 Nov 10 19:11 pageviews-20161110-180000.gz
-rwxrwxrwt 1 hdfs hdfs 61783210 Nov 10 18:14 pageviews-20161110-170000.gz

These files are missing on stat1002 where we would pick them up for rsync to dumps.wikimedia.org.

The issue also seems to be affecting the API (example). No data since 9 November when typically we would see data for the 10th and 11th by now.

Sorry for the inconvenience!

We deployed the new Analytics refinery on Thursday and one of its task was to kill and restart the Oozie jobs responsible for crunch webrequest data (and afterwards load cassandra pageview data and files). We missed to load two hours of logs and other jobs were waiting for them.

Joseph just started an Oozie job to fix the problem, we should get all the data back in the next two hours.

elukey triaged this task as Medium priority.Nov 12 2016, 7:43 PM
Milimetric edited projects, added Analytics-Kanban; removed Analytics.
Milimetric moved this task from Next Up to In Progress on the Analytics-Kanban board.

This issue seems to be fixed now, thanks

Great. I'll go ahead and close this.