Page MenuHomePhabricator

Hourly pageview data has stopped being published at dumps.wm.o
Closed, ResolvedPublic


mturk on #wikimedia-cloud reports that, as of midnight 2019-12-16, has not had new files listed since 2019-12-14 18:32
I am mildly suspicious of the fact that (part of T234229) was merged 48h beforehand

Event Timeline


The cluster is halted in two hours of day not being refined

Following regarding this issue of not having wiki pageview data dump T240815#5743206 --is it possible to have direct access to the log files? e.g. hdfs://analytics-hadoop/wmf/data/raw/webrequest/webrequest_text/hourly/2019/12/14 etc ?
For a research project I am doing, it would be incredibly helpful to have dependable daily raw data

-is it possible to have direct access to the log files?

The only files that accessible for the general public are the ones on dumps. We cannot grant access to the raw data.

^makes sense

Just noticed that some new data came in (currently good till 8am Dec 15). Is the bug fully fixed/someone actively working on it or still the tasks are being run manually?

We are doing infrastructure changes today and things will start flowing at some point today/tomorrow. Remember all this data is tier-2 and while delays of 1/2 days are not common, they can happen.

Got it. Just curious: is there any internal SLAs for tier-2 data? Because generally it does seem from my experience that such delays are not common.

fdans claimed this task.