Today I noticed that the logical backups keep failing again on production, backup buckets are empty because of the 7 day retain policy.
- The last sql-logic-backup pods that are completed and are still around on the prod cluster are from Sep 19th/20
- The oldest failing one that is still around is from Oct 3rd, the youngest one from today (Oct 4th)
I suspect that the currently available ephemeral storage is yet again to little, since the backup keeps growing in size (currently allocatable are 26.78 GB out of 63.16 GB).
ACs:
- Logical SQL backups work again
- E-Mail alerting for failed logical backups is in place
- E-Mail alerting for logical backup disk space below threshold
PRs:
increase disk size - https://github.com/wmde/wbaas-deploy/pull/1179- increase scratch volume size - https://github.com/wmde/wbaas-deploy/pull/1181
disk usage log: https://github.com/wmde/wbaas-backup/pull/27- disk usage metric & policy https://github.com/wmde/wbaas-deploy/pull/1185
- deploy to staging https://github.com/wmde/wbaas-deploy/pull/1190
- deploy to production https://github.com/wmde/wbaas-deploy/pull/1191