Page MenuHomePhabricator

Purge all Schema:Echo data after 90 days
Closed, ResolvedPublic3 Estimated Story Points


The collaboration team is phasing out their use of Schema:Echo [1]. The easiest way to delete all the data is just to use one of our existing purge strategies: purge all after 90 days. @jcrespo let me know if you have any concerns.


Event Timeline

When is this taking effect (the 90-day deadline)? Even if data is deleted, I have to make sure tables are also deleted on all servers.

If it's easier, @jcrespo, you can just delete all Echo_% tables any time, we have confirmation from Roan that they don't need that data any more. I see:


If you wanted to wait until all data is 90 days old, that will happen on June 5th.

(sorry for delay, was on vacation)

@Milimetric with all the clean up work done on EL servers, is this still a valid task?

It looks to me like those tables still exist and there's still data on the box that analytics-slave points to, so yeah, I think they need to be deleted. But I agree it's weird there's still data, it should've been deleted by the clean-up scripts. @mforns there's data here (for example Echo_7731316) from 2014 but it's not whitelisted, right?

The Echo schema is present in EventLogging's purging white-list, see:
Hence the purging script is keeping the following fields for all Echo schemas:

Echo	clientValidated
Echo	event_deliveryMethod
Echo	event_eventSource
Echo	event_notificationGroup
Echo	event_notificationType
Echo	event_revisionId
Echo	event_sender
Echo	event_version

If we remove the schema from the white-list, the purging script will start removing corresponding data older than 90 days from now on, but the historical data older than 91 days as of today will still need manual purging (the purging script executes every day and only affects the 91st day, not historical data).
We could execute the purging script from the beginning of time, but I'm not sure we can restrict the tables it will process. Is it possible to limit the mysql purging script to only process the given tables, @elukey? Otherwise, it might be easier to just drop the tables.

What should we do with this task?

Needs to be coordinated between me and @mforns when he is back from vacations. Going to put this task in our Incoming Backlog column to get triaged by my team again.

elukey raised the priority of this task from Low to Needs Triage.Sep 10 2018, 8:09 AM
elukey moved this task from Radar to Incoming on the Analytics board.

Let's 1) stop purging 2) drop all echo tables on events and events sanitized database 3) start purging again

Let's 1) stop purging 2) drop all echo tables on events and events sanitized database 3) start purging again

Makes sense. Also MySQL right, or are those already deleted?

Looks all tables in mySQL db also need to be deleted.

Milimetric triaged this task as High priority.
Milimetric lowered the priority of this task from High to Medium.
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.
Milimetric added a project: Analytics-Kanban.

Change 467983 had a related patch set uploaded (by Mforns; owner: Mforns):
[analytics/refinery@master] Remove Echo schema from EL sanitization white-list

Change 467983 merged by Mforns:
[analytics/refinery@master] Remove Echo schema from EL sanitization white-list

Tables dropped with Marcel on db110[7,8] (eventlogging master/slave). Marcel checked and nothing is there on HDFS.

The above code change has been merged, no more remaining actions!

elukey set the point value for this task to 3.