This table is now 33GB on disk. Perhaps we can delete stuff older than 1 year or similar.
Right now the oldest row is from July 2018
Thoughts?
This table is now 33GB on disk. Perhaps we can delete stuff older than 1 year or similar.
Right now the oldest row is from July 2018
Thoughts?
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
dbbackups: Create temporary fileset for dbprov for dbbackups archival | operations/puppet | production | +8 -0 |
Yes, or at least archive it and compress it. Historical records of table size may be useful for generating graphs such as: https://phabricator.wikimedia.org/T343131 cc @Ladsgroup
Mentioned in SAL (#wikimedia-operations) [2023-10-20T11:42:03Z] <jynus> refactoring tables @ db1164[bbackups] T349360
I've split the table into 2 tables: backup_files and backup_files_history, with this year's data and older one respectively:
-rw-rw---- 1 mysql mysql 25979518976 Oct 20 13:20 backup_files_history.ibd -rw-rw---- 1 mysql mysql 6887047168 Oct 20 12:15 backup_files.ibd
I will discuss next with Ladsgroup if to drop, archive or summarize the old one, as well as how. And how to automate it moving forward. This was already tracked at: T205627
I really need the historical data for reporting, figuring out trends and much more. So completely dropping them can hurt a lot of work :( That being said, we can do a lot, e.g. we can drop codfw backups in the history table? That would actually cut the size to half.
Another option is to split them by year, build a dump file and put the dump in swift as data store. We need to have a way for archived tables being stored and accessed instead of keeping them in production dbs forever.
@Ladsgroup Yeah, I thought so- so let's try to come up both of us with a strategy that is relatively simple to implement in automation and we can discuss it next Monday- to either reduce granularity or archival (for which there is currently no mechanism in place).
Before I can get the swift container running, I suggest just dropping every row for tables that end with _edits (T355594). It's not useful and it's taking A LOT of space.
Change 992755 had a related patch set uploaded (by Jcrespo; author: Jcrespo):
[operations/puppet@production] dbbackups: Create temporary fileset for dbprov for dbbackups archival
Change 992755 merged by Jcrespo:
[operations/puppet@production] dbbackups: Create temporary fileset for dbprov for dbbackups archival
I did a long term archival backup of records from 2022 and before into the Archival section of backups:
Terminated Jobs: JobId Level Files Bytes Status Finished Name ==================================================================== 549741 Full 25 2.959 G OK 24-Jan-24 17:01 dbprov1002.eqiad.wmnet-Weekly-Mon-ArchiveEqiad-mysql-dbbackups-archival $ ls dbbackups-schema-create.sql.gz dbbackups.backup_files_history-schema.sql.gz dbbackups.backup_files_history.00000.sql.gz dbbackups.backup_files_history.00001.sql.gz dbbackups.backup_files_history.00002.sql.gz dbbackups.backup_files_history.00003.sql.gz dbbackups.backup_files_history.00004.sql.gz dbbackups.backup_files_history.00005.sql.gz dbbackups.backup_files_history.00006.sql.gz dbbackups.backup_files_history.00007.sql.gz dbbackups.backup_files_history.00008.sql.gz dbbackups.backup_files_history.00009.sql.gz dbbackups.backup_files_history.00010.sql.gz dbbackups.backup_files_history.00011.sql.gz dbbackups.backup_files_history.00012.sql.gz dbbackups.backup_files_history.00013.sql.gz dbbackups.backup_files_history.00014.sql.gz dbbackups.backup_files_history.00015.sql.gz dbbackups.backup_files_history.00016.sql.gz dbbackups.backup_files_history.00017.sql.gz dbbackups.backups_history-schema.sql.gz dbbackups.backups_history.sql.gz metadata *list volumes pool=ArchiveEqiad +---------+------------------+-----------+---------+---------------+----------+--------------+---------+------+-----------+-----------+---------+----------+---------------------+-------------+ | MediaId | VolumeName | VolStatus | Enabled | VolBytes | VolFiles | VolRetention | Recycle | Slot | InChanger | MediaType | VolType | VolParts | LastWritten | ExpiresIn | +---------+------------------+-----------+---------+---------------+----------+--------------+---------+------+-----------+-----------+---------+----------+---------------------+-------------+ | 1,180 | archiveEqiad1180 | Append | 1 | 2,962,107,429 | 0 | 157,680,000 | 1 | 0 | 0 | File | 1 | 0 | 2024-01-24 17:01:34 | 157,679,352 | +---------+------------------+-----------+---------+---------------+----------+--------------+---------+------+-----------+-----------+---------+----------+---------------------+-------------+
As far as I am concerned (self-declared data hoarder, :-P), we could now run: DROP/TRUNCATE TABLE backup_files_history/backups_history; or whatever operation you prefer on m1 primary (dbbackups).
This is something that could be done yearly (?)- archiving records from the current_year - 2.
Of course, I won't do it on my own unless you tell me to, but declaring this is only blocked on what Ladsgroup or the #DBAs may want to do, additionally.
Assigning it to him to mark this as "not blocked on me", but feel free to unclaim it, reassign it, etc.
I'd appreciate a recovery of this a sql dump file somewhere, this doesn't store anything highly sensitive to be encrypted in our production. Then I can go ahead with dropping it in production.