Page MenuHomePhabricator

Swift container for archived mariadb tables
Open, Needs TriagePublic

Description

We have many tables lingering in production that we don't rely on them anymore but we need to keep them around for various reasons. These include a dedicated table in English Wikipedia's database for the board election of 2005 that we are still keeping around (2023 is about to end..) or cur table (T342772: Drop cur table in production) or many more random tables (moodbar_feedback, etc.).

It would be great to have a private swift container to leave SQL dump of these tables there and drop them from production.

To avoid scope creep (I can easily imagine this container turn into "Wikimedia Dropbox"), this should be only accessed by DBAs and store only mariadb tables (and relevant metadata) and nothing more.

Backups was considered as an option but deemed not fitting for this usecase as they are not designed to store data "forever".

Event Timeline

Technically, this is easy - we can make a swift account and away you go.

I don't want to tie anyone up in red tape, but I think it'd be good to have a lightweight process to ensure this doesn't just become a dustbin - so maybe we could have a wikitech page where we record for each dataset stored thus something like: (briefly) what it is; how long we should keep it for; who should have access; how big it is?

I don't want to tie anyone up in red tape, but I think it'd be good to have a lightweight process to ensure this doesn't just become a dustbin - so maybe we could have a wikitech page where we record for each dataset stored thus something like: (briefly) what it is; how long we should keep it for; who should have access; how big it is?

Sounds good to me but access is always limited to DBAs (if you mean who DBAs can hand it over to, it gets complicated in some cases)

Sounds good to me but access is always limited to DBAs (if you mean who DBAs can hand it over to, it gets complicated in some cases)

I did mean the latter - presumably we're keeping these because we think they might be useful to someone in the future, and it'd be useful for future-us to know straightforwardly if there are restrictions on who we could show the data to.

The biggest problem for that is the reorgs, a lot of teams we set to own something might not exist in a couple of years, generally I think it's better to keep at the discretion of the DBA when the time comes I think.

What should we do to move this forward?

I think we've broadly agreed the "process"; do you want to put a wikitech page together with that (and the initial data set(s)) on? And suggest a name for the swift account and I'll get it made for you.