Page MenuHomePhabricator

Backed-up large storage?
Open, Needs TriagePublic

Description

Some of my Toolforge MariaDB databases have very long tables, where much of the contents could be "archived" but should still be available.

One solution I have found for "glamtools" to store view data is to group the data, then generate individual sqlite3 database files for each group.
MariaDB is used to keep track of these files, and to store some legacy data.
This makes managing data a bit tricky, but it works.
However, I am storing the sqlite files in the tool path, and currently they take up 47GB ( /data/project/glamtools/viewdata ).
You might not want all that in a single MariaDB table (which would be the alternative).
This works fine for me, but it might be suboptimal from the NFS perspective (or not, I don't know how big/efficient the Toolforge NFS is, though it can drag a bit at times).

I am thinking of doing the same to older QuickStatements batches, where the command table is now at 9GB.

Is there/could there be a large, backed-up area on NFS for this purpose?
Or a read/write NFS mount for an object store (T225190)?
I'm happy to do this in the tool path, but maybe there is a better way?

Event Timeline

jcrespo subscribed.

47GB is not *that* large for a tool database, although it would be nice for you if you could split it somewhere logically (and assuming you need to query it, and it is not historical data/logs. I think we start to ask questions when one starts going over 100GB. Creating it as ENGINE=InnoDB, row_format=COMPRESSED would help with size.

If it is queried, split it on several tables on MariaDB, if it is for historical purposes, Data-Services people will be able to advice you better. There is no backup solution offered for tools data, as cloud infra is single-dc, and proper backups should be stored offsite (among other reasons).

We have hopes of providing archival storage in the future (T209530: Build user data backup service based on remote sync rather than NFS). The intent for this service will be disaster recovery however, not "extra" data storage.