| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | • fsero | T202504 Evaluate VMWare's Harbour as a docker registry | |||
| Resolved | JMeybohm | T212123 Kubernetes clusters roadmap | |||
| Resolved | elukey | T209271 improve docker registry architecture | |||
| Resolved | • fsero | T222210 placeholder task for migration problems |
Event Timeline
On contint1001, using docker-pkg, I have created a new image docker-registry.wikimedia.org/releng/tox:0.4.0.
On WMCS instance, I am unable to pull that image, the manifest is not found.
Eventually after sometime it appeared on the catalog at: https://tools.wmflabs.org/dockerregistry/releng/tox/tags/
Later on attempt to fetch yields things such as:
0.4.0: Pulling from releng/tox 8d22d214682d: Already exists dd5d82f356b7: Already exists 6adceec4ad0e: Pulling fs layer 95587f1fdaf7: Pulling fs layer dadbf101c83c: Pulling fs layer 6adceec4ad0e: Retrying in 5 seconds error pulling image configuration: unknown blob
or
0.4.0: Pulling from releng/tox 8d22d214682d: Already exists dd5d82f356b7: Already exists 6adceec4ad0e: Pulling fs layer 95587f1fdaf7: Pulling fs layer dadbf101c83c: Pulling fs layer error pulling image configuration: received unexpected HTTP status: 503 Backend fetch failed
On contint1001, I can pull the image if I refer to docker-registry.discovery.wmnet.
@fsero explained it might be due to a slow cross dc replication.
Mentioned in SAL (#wikimedia-releng) [2019-05-13T14:48:33Z] <hashar> if you build Docker containers, there is a long delay between it being build/published and it actually being available https://phabricator.wikimedia.org/T222210#5176863 known issue
I see something slightly different when I try to pull locally:
docker pull docker-registry.wikimedia.org/releng/quibble-stretch-php70:0.0.31-3
Error response from daemon: manifest for docker-registry.wikimedia.org/releng/quibble-stretch-php70:0.0.31-3 not found: manifest unknown: manifest unknown
This is ~60 minutes after the image was built.
@fsero I am afraid we will need some hot fix to make it way faster. Would it be possible to temporarily switch docker-registry.wikimedia.org to use the main/master registry? The intent is to have the image available as soon as possible after they got published.
@hashar the CR is already there https://gerrit.wikimedia.org/r/c/operations/puppet/+/509879 just need a +1 from Traffic and i´ll merge it
I can confirm that instances are now able to fetch new containers immediately after they have been published. So that solves it for me. Thank you for the very quick fix up!
new registry has been in production for some time without issues, there are some leftovers that need to be addressed i'll open subtasks for that.