Implement database binary backups into the production infrastructure:
We have working and monitored logical backups "dumps" of all important metadata mediawiki and misc servers, but recovering from them while highly flexible, it is long: Recovering from a full section dump may take from 3 up to 24 hours.
A less flexible but faster recovery method is needed in addition to the logical backups for the most critical part of the infrastructure "physical backups", "binary backups" or "snapshots", images of the files in its original shape that can be loaded fastly without requiring any transformation. Options for them could be:
- xtrabackup (or its mariadb version, mariabackup)
- cold backups
- volume snapshots
- delayed slaves
Research was done to decide which is the appropiate one for our infrastructure and we need to implement a proof of concept (without needing full coverage) that validates the decision taken and measures the Time to Recovery (TTR). mariabackup/xtrabackup worked for us, at least for custom testing and we will move forward with it (xtrabackup stopped working for MariaDB, so that was why it was being tested).
The snapshots can also be used in the future for automatic provisioning, although that is for the moment out of scope of this goal.
In order to implement fully this system, extra hardware is also needed for both the snapshot taking and the long term storage, which will be needed to be acquired and provisioned accordingly.
- Design the final architecture down to the physical architecture
- Procure hardware for binary backups
- Implement a more or less final snapshot cycle automation for a mediawiki metadata and misc databases