Page MenuHomePhabricator

{loon} Refactor Data Dumps
Closed, ResolvedPublic

Description

Objective
Clean up, consolidate and simplify the pageview dumps using input from the community.

Key Results

  • get community input into what PV dumps we should produce
  • assess capacity needs
  • consolidate/reorganize dumps
  • update http://dumps.wikimedia.org/other/index.html & notify the community
  • turn off obsolete dumps
  • include documentation on the time used in the filenames and timezone inside the files.

Why
We are producing redundant data in different sets/formats. We have just added a new dump to be consumed by Wikistats. Cleaning up will save storage space and CPU cycles.

LocationDateDescription
pagecounts-raw/2007-nowDomas’ original PV dumps, no mobile before x
pagecounts-all-sites/Pagecounts*2014...
pagecounts-all-sites/Projectcounts*2014...

Event Timeline

kevinator raised the priority of this task from to Medium.
kevinator updated the task description. (Show Details)
kevinator moved this task to Parent Tasks on the Analytics-Kanban board.
kevinator subscribed.
kevinator renamed this task from {loon} to {loon} Refactor Data Dumps.Oct 30 2015, 12:22 AM
kevinator set Security to None.
Milimetric claimed this task.
Milimetric subscribed.

No longer tracking projects by animal names