The current situation in the Toolforge docker registry is that if a disaster happens (VM shutdown, data corruption, etc), we will have to rebuild and push all docker images to the registry, which may take a lot of time (downtime).
The proposed new HA method for the Toolforge docker registry is simple, cold-standby:
* docker images are built and they are pushed to the active registry node (using the DNS `docker-registry.tools.wmflabs.org`)
* the active registry node stores the image locally (usually `/srv/registry`)
* there is a daily cron job running in the standby node to rsync the registry data from the active node.
* in case of disaster of the active registry node (VM shutdown, corruption, etc), we can switch the main DNS and start the docker registry daemon in the standby node
* we may loss the differential data in the registry since the last sync. That can be solved easily by pushing again the docker images, but only a few instead of all of them.
This cold-standby mechanism, even if not perfect from the automation point of view, provides a robust improvement with regards the current situation.
Also, is really simple to implement. The missing bits for this are:
* [] rsync puppet code
* [] admin docs generation in wikitech