Page MenuHomePhabricator

Regularly purge EventLogging data in Hadoop {stag} [8 pts]
Closed, ResolvedPublic

Description

We will likely need a script to do this. It may be possible that some python code I wrote in refinery can be used for this as is. It may need adapted.

Do we need to do this conditionally based on schemas? Do some schemas not get purged? If so, then perhaps we need to make schema purge settings part of EventCapsule.

Event Timeline

Ottomata raised the priority of this task from to Medium.
Ottomata updated the task description. (Show Details)
Ottomata added subscribers: kevinator, Aklapper, Ottomata.
Ottomata renamed this task from Regularly purge EventLogging data in Hadoop to Regularly purge EventLogging data in Hadoop {stag}.Jul 19 2015, 6:08 AM
Ottomata set Security to None.
Milimetric renamed this task from Regularly purge EventLogging data in Hadoop {stag} to Regularly purge EventLogging data in Hadoop {stag} [8 pts].Sep 14 2015, 4:26 PM
Milimetric moved this task from Prioritized to Tasked on the Analytics-Backlog board.

Change 240299 had a related patch set uploaded (by Madhuvishy):
[WIP] Add script to drop old eventlogging partitions

https://gerrit.wikimedia.org/r/240299

Change 240299 merged by Ottomata:
Add script to drop old eventlogging partitions

https://gerrit.wikimedia.org/r/240299

Change 240449 had a related patch set uploaded (by Madhuvishy):
analytics: Add cron to drop Eventlogging data older than 90 days from hadoop

https://gerrit.wikimedia.org/r/240449

Change 240449 merged by Ottomata:
analytics: Add cron to drop Eventlogging data older than 90 days from hadoop

https://gerrit.wikimedia.org/r/240449