Page MenuHomePhabricator

Add a job that regularly deletes druid webrequest deep-stored data
Closed, ResolvedPublic5 Estimated Story Points

Description

Currently, druid keeps 7 days of sampled webrequests in the hitorical nodes.
However data is not deleted from deep-storage, putting us in breach of data rentention policy.
We should have a job that deletes segments older than 60 days. Keeping that many data is just in case some data-emergency occurs, we'll be able to relaod it easily.
Hoiw to delete deep-storage segment: Second part of https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Delete_a_data_set_from_deep_storage

  • Script deleting data
  • Puppetization of script on webrequest datasource

Event Timeline

Change 361651 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Add script deleting druid deep storage data

https://gerrit.wikimedia.org/r/361651

First version of a python script deleting data (tested, obviously).
After CR and when deployed, let's not forget to automate it for webrequest datasource using a puppet cron.

Change 361651 merged by Joal:
[analytics/refinery@master] Add script deleting druid deep storage data

https://gerrit.wikimedia.org/r/361651

We need to add corresponding puppet code to execute this script as a cron

Change 362148 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet@production] Add cron job dropping webrequest from druid

https://gerrit.wikimedia.org/r/362148

Change 362396 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Upgrade script dropping druid deep-storage data

https://gerrit.wikimedia.org/r/362396

Change 362396 merged by Joal:
[analytics/refinery@master] Upgrade script dropping druid deep-storage data

https://gerrit.wikimedia.org/r/362396

Change 362148 merged by Elukey:
[operations/puppet@production] role::analytics_cluster::refinery::job::data_drop: drop old druid data

https://gerrit.wikimedia.org/r/362148

JAllemandou set the point value for this task to 5.Jul 6 2017, 8:32 AM