We currently keep them indefinitely. They are heavy (~25Tb each) and we really don't need more than 2 or 3.
Description
Details
Event Timeline
After reviewing data-deletion scripts, wikitext_history snapshots are deleted, but 6 of them are kept.
See https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/analytics/refinery/job/data_purge.pp#L128 and https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-snapshots.
Change proposal: Remove the lists from https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-snapshots and pass them as parameters.
Having this would allow us to have different jobs for different retention times.
@mforns Thoughts?
Change proposal: Remove the lists from https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-snapshots and pass them as parameters.
Having this would allow us to have different jobs for different retention times.
@mforns Thoughts?
Yes, definitely :]
Change 623586 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Update drop-mediawiki-snapshots parameters and datasets
Change 623601 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet@production] Update analytics snapshots data purge
Change 623586 merged by Joal:
[analytics/refinery@master] Update drop-mediawiki-snapshots parameters and datasets
Change 623601 merged by Ottomata:
[operations/puppet@production] Update analytics snapshots data purge