Page MenuHomePhabricator

Regularly purge EventLogging data in Hadoop {stag} [8 pts]
Closed, ResolvedPublic

Description

We will likely need a script to do this. It may be possible that some python code I wrote in refinery can be used for this as is. It may need adapted.

Do we need to do this conditionally based on schemas? Do some schemas not get purged? If so, then perhaps we need to make schema purge settings part of EventCapsule.

Event Timeline

Ottomata created this task.Jul 19 2015, 5:53 AM
Ottomata raised the priority of this task from to Normal.
Ottomata updated the task description. (Show Details)
Ottomata added subscribers: kevinator, Aklapper, Ottomata.
Ottomata renamed this task from Regularly purge EventLogging data in Hadoop to Regularly purge EventLogging data in Hadoop {stag}.Jul 19 2015, 6:08 AM
Ottomata set Security to None.
ggellerman moved this task from Incoming to Low on the Analytics-Backlog board.Jul 24 2015, 4:10 PM
ggellerman moved this task from Low to Medium on the Analytics-Backlog board.
Milimetric renamed this task from Regularly purge EventLogging data in Hadoop {stag} to Regularly purge EventLogging data in Hadoop {stag} [8 pts].Sep 14 2015, 4:26 PM
Milimetric moved this task from Prioritized to Tasked on the Analytics-Backlog board.
This comment was removed by Milimetric.
mforns claimed this task.Sep 21 2015, 3:24 PM
mforns edited projects, added Analytics-Kanban; removed Analytics-Backlog.
mforns moved this task from Next Up to In Progress on the Analytics-Kanban board.Sep 21 2015, 3:42 PM
mforns reassigned this task from mforns to madhuvishy.Sep 21 2015, 10:18 PM

Change 240299 had a related patch set uploaded (by Madhuvishy):
[WIP] Add script to drop old eventlogging partitions

https://gerrit.wikimedia.org/r/240299

Change 240299 merged by Ottomata:
Add script to drop old eventlogging partitions

https://gerrit.wikimedia.org/r/240299

Change 240449 had a related patch set uploaded (by Madhuvishy):
analytics: Add cron to drop Eventlogging data older than 90 days from hadoop

https://gerrit.wikimedia.org/r/240449

Change 240449 merged by Ottomata:
analytics: Add cron to drop Eventlogging data older than 90 days from hadoop

https://gerrit.wikimedia.org/r/240449

kevinator closed this task as Resolved.Sep 25 2015, 3:33 PM