Page MenuHomePhabricator

Problem with hadoop data ingestion impacting data delivery [8 pts]
Closed, ResolvedPublic

Description

Currently, we consume the pagecounts raw files that you generate every hour as an input of our Wikipedia infrastructure.

http://dumps.wikimedia.org/other/pagecounts-raw/2016/2016-01/

We have noticed that there are no new files from today 28th January, and because of that we are having issues in our side.

I would like to know if this issue is already noticed in your side and if there is a current action to fix the generation of this files.

Event Timeline

DianaArq raised the priority of this task from to Needs Triage.
DianaArq updated the task description. (Show Details)
DianaArq subscribed.
ArielGlenn set Security to None.
ArielGlenn subscribed.

I checked on stat1002:/mnt/hdfs/wmf/data/archive/pagecounts-raw/2016/2016-01 and the files aren't there.

Hello,
Issue is know, an email has been sent to the analytics list about the problem.
Hadoop data ingestion has been failing yesterday, and was restored around 13:00UTC today , but all the jobs down the pipeline are late.
Data will show up, but it'll take some time.

Milimetric renamed this task from pagecounts raw from 28/01 are not present to pagecounts raw from 28/01 are not present [3 pts].Jan 28 2016, 6:09 PM
Milimetric assigned this task to JAllemandou.
Milimetric edited projects, added: Analytics-Kanban; removed: Analytics.
Milimetric moved this task from Next Up to In Code Review on the Analytics-Kanban board.
Milimetric moved this task from In Code Review to Ready to Deploy on the Analytics-Kanban board.
JAllemandou renamed this task from pagecounts raw from 28/01 are not present [3 pts] to Problem with hadoop data ingestion impacting data delivery [8 pts].Jan 30 2016, 11:14 AM
JAllemandou moved this task from Ready to Deploy to In Progress on the Analytics-Kanban board.