Page MenuHomePhabricator

pagecounts stats are behind by about 16 hours
Closed, ResolvedPublic

Description

Its no longer possible to get the stats for the last hour immediately (from http://dumps.wikimedia.org/other/pagecounts-raw/2015/2015-02/ ), as the stats seem to be behind by about 16 hours.

This seems to have been the case since about jan 28.

I had a tool that shows users what's popular right now, and thus I miss having the ability to see those stats immediately.

Event Timeline

Bawolff created this task.Feb 17 2015, 10:53 PM
Bawolff raised the priority of this task from to Needs Triage.
Bawolff updated the task description. (Show Details)
Bawolff added a subscriber: Bawolff.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 17 2015, 10:53 PM
Hydriz added a subscriber: Hydriz.Feb 20 2015, 3:19 AM

This issue seems to be temporary, I can't reproduce it.

Well its 4:30 am utc right now, and only the 2:00 file is up. So that's still 2 hours behind (which is much better, but still not as great as it used to be)

Nuria added a project: Analytics-Kanban.
Nuria added a subscriber: Ottomata.
Nuria added a subscriber: Nuria.

Added otto, I believe the pagecounts on that directory should be updated from hadoop as of recent.

Hi, yes, we did a cluster upgrade on Monday, which caused a few jobs to lag behind. Everything should be back in order, please let me know if it isn't.

Thanks!

Ah yes, 2 hours behind. That will be the case going forward. The backend that is used to generate pagecounts-raw has been changed[1].

The good news is, the new backend is less lossy than the old one, so counts should be a little more reflective of reality.

[1] https://lists.wikimedia.org/pipermail/analytics/2015-January/003259.html

Ottomata closed this task as Resolved.Feb 23 2015, 2:39 PM
Ottomata claimed this task.
kevinator moved this task from Next Up to Done on the Analytics-Kanban board.Feb 24 2015, 3:15 PM