Page MenuHomePhabricator

Back-fill pageviews data for to May 2015
Closed, ResolvedPublic13 Estimated Story Points


In an upcoming message to analytics-l, we'll propose cleaning up the pageview datasets currently listed on To do this, it would be best if we had as many dumps files for the new data as possible. The puppet change to re-organize is being worked on here:

Event Timeline

Milimetric assigned this task to elukey.
Milimetric raised the priority of this task from to Medium.
Milimetric updated the task description. (Show Details)
Milimetric added a project: Analytics.
Milimetric added a subscriber: Milimetric.

Adding more info after a chat with Dan.

The dumps are not visible yet in the /other folder but only in

The 2005/* directories are showing data up to May 2005 but some of them have only project views data, missing page views.

Creating an ad hoc oozie workflow starting from might be a good first step.


This [WIP] patch [1] is the one that will add the pageviews dataset to and bring some general sanity to the analytics data presented there.


Looks good, @elukey. I saw the output files and they're what I'd expect. I think you can go ahead and start the backfill.

@elukey @Milimetric : Sounds good, but let's wait for encoding-issue-backfilling to be finished :)

Milimetric set the point value for this task to 13.Mar 3 2016, 5:18 PM

@elukey: I forgot to mention, the process is that @Nuria is the only one who closes tasks as resolved. That way she can "accept" that they're done.