The collaboration team is phasing out their use of Schema:Echo [1]. The easiest way to delete all the data is just to use one of our existing purge strategies: purge all after 90 days. @jcrespo let me know if you have any concerns.
Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
analytics/refinery | master | +0 -12 | Remove Echo schema from EL sanitization white-list |
Event Timeline
When is this taking effect (the 90-day deadline)? Even if data is deleted, I have to make sure tables are also deleted on all servers.
If it's easier, @jcrespo, you can just delete all Echo_% tables any time, we have confirmation from Roan that they don't need that data any more. I see:
Echo_5285750 |
Echo_5364744 |
Echo_5423520 |
Echo_6081131 |
Echo_7572295 |
Echo_7731316 |
If you wanted to wait until all data is 90 days old, that will happen on June 5th.
(sorry for delay, was on vacation)
@Milimetric with all the clean up work done on EL servers, is this still a valid task?
It looks to me like those tables still exist and there's still data on the box that analytics-slave points to, so yeah, I think they need to be deleted. But I agree it's weird there's still data, it should've been deleted by the clean-up scripts. @mforns there's data here (for example Echo_7731316) from 2014 but it's not whitelisted, right?
The Echo schema is present in EventLogging's purging white-list, see:
https://github.com/wikimedia/puppet/blob/production/modules/profile/files/mariadb/misc/eventlogging/eventlogging_purging_whitelist.tsv#L42
Hence the purging script is keeping the following fields for all Echo schemas:
Echo clientValidated Echo event_deliveryMethod Echo event_eventSource Echo event_notificationGroup Echo event_notificationType Echo event_revisionId Echo event_sender Echo event_version
If we remove the schema from the white-list, the purging script will start removing corresponding data older than 90 days from now on, but the historical data older than 91 days as of today will still need manual purging (the purging script executes every day and only affects the 91st day, not historical data).
We could execute the purging script from the beginning of time, but I'm not sure we can restrict the tables it will process. Is it possible to limit the mysql purging script to only process the given tables, @elukey? Otherwise, it might be easier to just drop the tables.
Needs to be coordinated between me and @mforns when he is back from vacations. Going to put this task in our Incoming Backlog column to get triaged by my team again.
Let's 1) stop purging 2) drop all echo tables on events and events sanitized database 3) start purging again
Let's 1) stop purging 2) drop all echo tables on events and events sanitized database 3) start purging again
Makes sense. Also MySQL right, or are those already deleted?
Change 467983 had a related patch set uploaded (by Mforns; owner: Mforns):
[analytics/refinery@master] Remove Echo schema from EL sanitization white-list
Change 467983 merged by Mforns:
[analytics/refinery@master] Remove Echo schema from EL sanitization white-list
Tables dropped with Marcel on db110[7,8] (eventlogging master/slave). Marcel checked and nothing is there on HDFS.
The above code change has been merged, no more remaining actions!