Page MenuHomePhabricator

Gather all data-purge into a single job
Open, MediumPublic

Description

Analytics data-purge is spread over multiple jobs with different running ways (1 timer per dataset, or multiple datasets through a single timer).
I suggest creating a script that would gather data to clean through configuration making it easier to maintain (1 single point of config for all data-purges). This script could be run at different time intervals (parameter of time-interval for the script to know which data to work), and would take advantage of the different purging strategies we already have solutions for (time-partition as in webrequest, snapshot, hive or not).

Event Timeline

Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.

This sounds pretty big, and I think is related to our desire to refactor our sanitization pipeline and processes. I'm moving this back into incoming and we can groom this as a non ops-excellence task.

Milimetric lowered the priority of this task from High to Medium.Jan 14 2021, 6:04 PM
Milimetric moved this task from Incoming to Smart Tools for Better Data on the Analytics board.