Page MenuHomePhabricator

Update data-purge for processed mediawiki_wikitext_history (6 snapshot kept, 3 would be sufficient)
Closed, ResolvedPublic

Description

We currently keep them indefinitely. They are heavy (~25Tb each) and we really don't need more than 2 or 3.

Event Timeline

After reviewing data-deletion scripts, wikitext_history snapshots are deleted, but 6 of them are kept.
See https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/analytics/refinery/job/data_purge.pp#L128 and https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-snapshots.

Change proposal: Remove the lists from https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-snapshots and pass them as parameters.
Having this would allow us to have different jobs for different retention times.
@mforns Thoughts?

JAllemandou renamed this task from Add data-purge for processed mediawiki_wikitext_history to Update data-purge for processed mediawiki_wikitext_history (6 snapshot kept, 3 would be sufficient).Nov 1 2019, 12:35 PM
fdans triaged this task as Medium priority.Nov 4 2019, 5:07 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.

@JAllemandou

Change proposal: Remove the lists from https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-snapshots and pass them as parameters.
Having this would allow us to have different jobs for different retention times.
@mforns Thoughts?

Yes, definitely :]

Milimetric raised the priority of this task from Medium to Unbreak Now!.
Milimetric lowered the priority of this task from Unbreak Now! to Needs Triage.
Milimetric triaged this task as High priority.
Milimetric added a project: Analytics-Kanban.

Change 623586 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Update drop-mediawiki-snapshots parameters and datasets

https://gerrit.wikimedia.org/r/623586

Change 623601 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet@production] Update analytics snapshots data purge

https://gerrit.wikimedia.org/r/623601

Change 623586 merged by Joal:
[analytics/refinery@master] Update drop-mediawiki-snapshots parameters and datasets

https://gerrit.wikimedia.org/r/623586

Change 623601 merged by Ottomata:
[operations/puppet@production] Update analytics snapshots data purge

https://gerrit.wikimedia.org/r/623601