Page MenuHomePhabricator

Help panel: delete sanitized data from before Oct 1
Closed, ResolvedPublic


We have completed our Help Panel experiment and reached the end of our data retention period. We would therefore want someone from Analytics to delete the data from before October 1, 2019 from event_sanitized.helppanel.

In addition, @nettrom_WMF will delete nettrom_growth.helppanel_0410, which contains an earlier set of sanitized data.

Event Timeline

nettrom_WMF moved this task from Incoming to In Progress on the Growth-Team (Current Sprint) board.
nettrom_WMF moved this task from Triage to Doing on the Product-Analytics board.

The second part of this, me deleting the initial set of data has been done:

hive (nettrom_growth)> DROP TABLE helppanel_0410;
Time taken: 0.274 seconds
hive (nettrom_growth)> SELECT * FROM helppanel_0410;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'helppanel_0410'

Next step is for Analytics to get back to us about the other part.

nettrom_WMF triaged this task as High priority.Oct 8 2019, 5:58 PM
fdans raised the priority of this task from High to Unbreak Now!.Oct 10 2019, 5:09 PM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptOct 10 2019, 5:09 PM
fdans lowered the priority of this task from Unbreak Now! to High.Oct 10 2019, 5:10 PM
fdans added a subscriber: fdans.Oct 10 2019, 5:20 PM

@nettrom_WMF can we confirm that the range to be deleted is beginning of time up to Oct 1? Would this be deleting all fields?

fdans moved this task from Incoming to Ops Week on the Analytics board.Oct 10 2019, 5:21 PM

@fdans : Can confirm that the range to be deleted is the beginning of time (which is like April 2019) up to Oct 1. And yes, all fields are to be deleted. Thanks!

@fdans -- thank you for working on this. I just want to mention that we consider this a high priority task, and hoping for it to be complete within days. The reason is that we're at the end of the data retention period, and want to delete user data promptly. In the future, we'll be sure to give more notice.

mforns added a subscriber: mforns.Oct 15 2019, 3:32 PM

I have deleted all data directories and Hive partitions for event_sanitized.helppanel up to Oct 1st 2019 (not included).
I checked that the table looks good, but please ping us if you find any inconsistency.
The deleted data will stay in Hadoop's trash folder for a couple weeks, in case you want to recover something, then will be automatically deleted.

mforns claimed this task.Oct 15 2019, 3:34 PM
mforns added a project: Analytics-Kanban.
mforns moved this task from Next Up to Done on the Analytics-Kanban board.
nettrom_WMF closed this task as Resolved.Oct 15 2019, 3:57 PM

@mforns : thanks for taking care of this! I've verified that the table doesn't contain any data prior to Oct 1st. Everything looks good here, so I'm closing this.