Page MenuHomePhabricator

Growth: delete data older than 90 days
Closed, ResolvedPublic

Description

The Growth team has had an exception to the data retention policy and this was implemented as a deletion of sanitized data in T237124. We're looking to end this exception, and as the parent task lays out deleting the old data is the last of the steps needed to complete this work. We're asking to have the following tables deleted:

  1. event_sanitized.homepagevisit
  2. event_sanitized.homepagemodule
  3. event_sanitized.helppanel
  4. event_sanitized.newcomertask

Event Timeline

Thanks, @Rileych! The PA team will prioritize this in our next board refinement meeting (Feb 9).

Hi @mpopov is there anything needed from us for this task and the parent?

@fdans: Maybe? I'm not sure what modifications @mforns did to the standard sanitization process to enable 270 days of retention for @nettrom_WMF's Growth event data.

@fdans : The parent task asks to revert the changes made in T237124. There's also the second child task, T273826, for updating the whitelist. Once those two are completed, we can delete the associated four tables in event_sanitized, which is what this task is about. I'll update the task description to reflect that specific ask. I see now that all of this hasn't been very clear, maybe I should make a new parent task that lays all of this out and mentions all of the child tasks in the description?

I updated the task description to make the ask clear. I've also updated the description of the parent task to make it clearer what we're looking to do, and create a new child task for the part that didn't already have one. Hopefully this makes everything less confusing, but please let me know if it didn't and I'll do my best to help out!

I just merged the patch removal of the growth schemas from the include-list (T273826).
When that is deployed, I will delete the 4 tables from the event_sanitized database.
And also remove some puppet code that was purging those tables after 270 days.
Will ping you when done.
Cheers!

Change 665326 had a related patch set uploaded (by Mforns; owner: Mforns):
[operations/puppet@production] analytics:refinery:job:data_purge: Absent Growth deletion jobs

https://gerrit.wikimedia.org/r/665326

Change 665328 had a related patch set uploaded (by Mforns; owner: Mforns):
[operations/puppet@production] analytics:refinery:job:data_purge: Remove Growth deletion jobs

https://gerrit.wikimedia.org/r/665328

The data has been deleted!
Once those patches get merged (unused jobs), we can close this task.

Change 665326 merged by Razzi:
[operations/puppet@production] analytics:refinery:job:data_purge: Absent Growth deletion jobs

https://gerrit.wikimedia.org/r/665326

Change 665328 merged by Razzi:
[operations/puppet@production] analytics:refinery:job:data_purge: Remove Growth deletion jobs

https://gerrit.wikimedia.org/r/665328

The tables have been deleted and we've verified that this didn't break anything in the Growth team's reporting, so closing as resolved.