Analytics data-purge is spread over multiple jobs with different running ways (1 timer per dataset, or multiple datasets through a single timer).
I suggest creating a script that would gather data to clean through configuration making it easier to maintain (1 single point of config for all data-purges). This script could be run at different time intervals (parameter of time-interval for the script to know which data to work), and would take advantage of the different purging strategies we already have solutions for (time-partition as in webrequest, snapshot, hive or not).
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Duplicate | None | T272060 Implement Data Governance Tool | |||
Open | None | T262201 Gather all data-purge into a single job |
Event Timeline
Comment Actions
This sounds pretty big, and I think is related to our desire to refactor our sanitization pipeline and processes. I'm moving this back into incoming and we can groom this as a non ops-excellence task.