In T359067 ServiceOps and Machine Learning agreed on a short/medium fix to allow bigger image layers to be pushed to the Docker registry's nodes.
The idea is to:
- Bump VM memory from 4GB to 6GB on all registry* nodes
- Increase the nginx's tmpfs mountpoint on their OS to 4GB as well. That would be profile::nginx::tmpfs_size in hieradata/role/common/docker_registry_ha/registry.yaml
The docker-registry's discovery record shows eqiad depooled and codfw pooled, so I could probably start from there and then move to codfw.
After reading https://wikitech.wikimedia.org/wiki/Docker-registry/Runbook I boldly make those assumptions:
- Shutting down the VMs on the depooled DC shouldn't take any extra step, since the only important thing seems to be Swift replication and the VMs run stateless daemons.
- Upgrading codfw may be done one VM at the time, without the need of a failover, but to be safe we can do it anyway to guarantee capacity (say if Ganeti fails to bring up the new VM etc..).
Does the above make sense? Any more complete/sound procedures to follow?