Incident report: https://wikitech.wikimedia.org/wiki/Incidents/2022-05-31_Analytics_Data_Lake_-_Hadoop_Namenode_failure
- [x] Make old journalnode edits files are cleaned properly now that namenodes are back online and saving fs image snapshots.
- [x] Reduce `profile::hadoop::backup::namenode::fsimage_retention_days`, 20 is too many
-  Possibly separate image backup storage from namenode data storage partitions
-  `hdfs dfsadmin -fetchImage` should have kept failing and not recovered.
-  gobblin did not fail with proper error codes while NameNodes were offline
- [x] Make sure journalnodes alert sooner about disk journalnode partition
- [x] Check that bacula backups of fs image snapshots are available and usable
- [x] Check that the alerting for disk space is correct on an-master hosts - since we seem not to have been alerted to `/srv/` becoming full on an-master1002