Page MenuHomePhabricator

Purge all Schema:Echo data after 90 days
Closed, ResolvedPublic3 Estimate Story Points

Description

The collaboration team is phasing out their use of Schema:Echo [1]. The easiest way to delete all the data is just to use one of our existing purge strategies: purge all after 90 days. @jcrespo let me know if you have any concerns.

[1] https://meta.wikimedia.org/wiki/Schema_talk:Echo

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 2 2016, 7:00 PM
Milimetric triaged this task as Low priority.Mar 2 2016, 7:03 PM
Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptMar 2 2016, 7:19 PM
Milimetric moved this task from Incoming to Radar on the Analytics board.Mar 7 2016, 5:06 PM

When is this taking effect (the 90-day deadline)? Even if data is deleted, I have to make sure tables are also deleted on all servers.

If it's easier, @jcrespo, you can just delete all Echo_% tables any time, we have confirmation from Roan that they don't need that data any more. I see:

Echo_5285750
Echo_5364744
Echo_5423520
Echo_6081131
Echo_7572295
Echo_7731316

If you wanted to wait until all data is 90 days old, that will happen on June 5th.

(sorry for delay, was on vacation)

1978Gage2001 moved this task from Triage to In progress on the DBA board.Dec 11 2017, 9:46 AM
Marostegui moved this task from In progress to Triage on the DBA board.Dec 11 2017, 10:56 AM

@Milimetric with all the clean up work done on EL servers, is this still a valid task?

It looks to me like those tables still exist and there's still data on the box that analytics-slave points to, so yeah, I think they need to be deleted. But I agree it's weird there's still data, it should've been deleted by the clean-up scripts. @mforns there's data here (for example Echo_7731316) from 2014 but it's not whitelisted, right?

mforns added a subscriber: elukey.EditedMar 13 2018, 1:51 PM

The Echo schema is present in EventLogging's purging white-list, see:
https://github.com/wikimedia/puppet/blob/production/modules/profile/files/mariadb/misc/eventlogging/eventlogging_purging_whitelist.tsv#L42
Hence the purging script is keeping the following fields for all Echo schemas:

Echo	clientValidated
Echo	event_deliveryMethod
Echo	event_eventSource
Echo	event_notificationGroup
Echo	event_notificationType
Echo	event_revisionId
Echo	event_sender
Echo	event_version

If we remove the schema from the white-list, the purging script will start removing corresponding data older than 90 days from now on, but the historical data older than 91 days as of today will still need manual purging (the purging script executes every day and only affects the 91st day, not historical data).
We could execute the purging script from the beginning of time, but I'm not sure we can restrict the tables it will process. Is it possible to limit the mysql purging script to only process the given tables, @elukey? Otherwise, it might be easier to just drop the tables.

What should we do with this task?

Restricted Application added a project: Growth-Team. · View Herald TranscriptSep 7 2018, 6:31 PM

Needs to be coordinated between me and @mforns when he is back from vacations. Going to put this task in our Incoming Backlog column to get triaged by my team again.

elukey raised the priority of this task from Low to Needs Triage.Sep 10 2018, 8:09 AM
elukey moved this task from Radar to Incoming on the Analytics board.
Nuria added a subscriber: Nuria.Sep 10 2018, 4:29 PM

Let's 1) stop purging 2) drop all echo tables on events and events sanitized database 3) start purging again

Let's 1) stop purging 2) drop all echo tables on events and events sanitized database 3) start purging again

Makes sense. Also MySQL right, or are those already deleted?

Nuria added a comment.Sep 10 2018, 5:21 PM

Looks all tables in mySQL db also need to be deleted.

Milimetric assigned this task to elukey.Sep 13 2018, 4:27 PM
Milimetric triaged this task as High priority.
Milimetric lowered the priority of this task from High to Medium.
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.
Milimetric added a project: Analytics-Kanban.

Change 467983 had a related patch set uploaded (by Mforns; owner: Mforns):
[analytics/refinery@master] Remove Echo schema from EL sanitization white-list

https://gerrit.wikimedia.org/r/467983

Change 467983 merged by Mforns:
[analytics/refinery@master] Remove Echo schema from EL sanitization white-list

https://gerrit.wikimedia.org/r/467983

Tables dropped with Marcel on db110[7,8] (eventlogging master/slave). Marcel checked and nothing is there on HDFS.

The above code change has been merged, no more remaining actions!

elukey moved this task from Next Up to Done on the Analytics-Kanban board.Oct 17 2018, 3:18 PM
elukey set the point value for this task to 3.
Nuria closed this task as Resolved.Oct 19 2018, 2:21 AM