Page MenuHomePhabricator

Monitor backup generation for failure or incorrect generation
Closed, ResolvedPublic

Description

As part of the goal for the backups monitoring we have to create a set of scripts to:

Generate metrics and historic data about databases (objects, table and wiki sizes, growth over time, etc)
Detect and alert on backup metrics anomalies

Event Timeline

jcrespo claimed this task.
jcrespo added a subscriber: jcrespo.

With the disclaimers that T205626: Document clearly the mariadb backup and recovery setup T205627: Purge and monitor old metadata for the mariadb backups database and T205628: Handle object metadata backups and compare it with stored database object inventory were not done (but also were not part of the original scope either), we do have a new set of icinga checks that make sure that backups happen regularly and correctly, and have reasonable sizes (e.g. non-0 byte backups).

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=Backup+of

Screenshot_20180927_165524.png (877×1 px, 329 KB)

The function needs a lot more polishing but it works (e.g. it detects no backups are currently available for eqiad for s6, s7, s8 and x1), due to a lack of hardware that will be purchased next quarter.

Having said that, I consider the original project completed.