Page MenuHomePhabricator

Mediacounts missing top1000 files after 2018-01-01
Closed, ResolvedPublic3 Estimated Story Points

Description

The top1000 files for the mediacounts are no longer being produced after January 1, 2018. See a related issue: T122864.

URL: https://dumps.wikimedia.org/other/mediacounts/daily/2018/

Event Timeline

Mentioned in SAL (#wikimedia-analytics) [2018-01-23T20:10:04Z] <ottomata> hdfs dfs -chmod 775 /wmf/data/archive/mediacounts/daily/2018 for T185419

Huh! Ok, so when the new 2018 directory was created by the Hadoop jobs that compute the daily mediacount files, the directory was created with a file mode that was not writeable by @ezachte's top1000 scripts.

I've chmod-ed the 2018 directory for now, but we need to make the job that creates this data chmod new directories g+w.

Change 405938 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/refinery@master] Chmod yearly mediacounts directory so ezachte's scripts can write top1000 files

https://gerrit.wikimedia.org/r/405938

Fixed as of 2018-01-23.
@ezachte : Could you launh a backfill of 2018-01-01 to 2018-01-22 ?
Many thanks !

Change 405938 merged by Joal:
[analytics/refinery@master] Chmod yearly mediacounts directory so ezachte's scripts can write top1000 files

https://gerrit.wikimedia.org/r/405938

JAllemandou moved this task from Paused to Done on the Analytics-Kanban board.
JAllemandou set the point value for this task to 3.