Page MenuHomePhabricator

placeholder task for migration problems
Closed, ResolvedPublic

Event Timeline

fsero created this task.Apr 30 2019, 5:19 PM
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptApr 30 2019, 5:19 PM
hashar added a subscriber: hashar.May 13 2019, 2:31 PM

On contint1001, using docker-pkg, I have created a new image docker-registry.wikimedia.org/releng/tox:0.4.0.

On WMCS instance, I am unable to pull that image, the manifest is not found.

Eventually after sometime it appeared on the catalog at: https://tools.wmflabs.org/dockerregistry/releng/tox/tags/

Later on attempt to fetch yields things such as:

0.4.0: Pulling from releng/tox                                                                                                                                                                     
8d22d214682d: Already exists                                                                                                                                                                       
dd5d82f356b7: Already exists
6adceec4ad0e: Pulling fs layer
95587f1fdaf7: Pulling fs layer
dadbf101c83c: Pulling fs layer
6adceec4ad0e: Retrying in 5 seconds
error pulling image configuration: unknown blob

or

0.4.0: Pulling from releng/tox                                                                                                                                                                     
8d22d214682d: Already exists                                                                                                                                                                       
dd5d82f356b7: Already exists
6adceec4ad0e: Pulling fs layer
95587f1fdaf7: Pulling fs layer
dadbf101c83c: Pulling fs layer
error pulling image configuration: received unexpected HTTP status: 503 Backend fetch failed

On contint1001, I can pull the image if I refer to docker-registry.discovery.wmnet.

@fsero explained it might be due to a slow cross dc replication.

Mentioned in SAL (#wikimedia-releng) [2019-05-13T14:48:33Z] <hashar> if you build Docker containers, there is a long delay between it being build/published and it actually being available https://phabricator.wikimedia.org/T222210#5176863 known issue

I see something slightly different when I try to pull locally:

docker pull docker-registry.wikimedia.org/releng/quibble-stretch-php70:0.0.31-3
Error response from daemon: manifest for docker-registry.wikimedia.org/releng/quibble-stretch-php70:0.0.31-3 not found: manifest unknown: manifest unknown

This is ~60 minutes after the image was built.

@fsero I am afraid we will need some hot fix to make it way faster. Would it be possible to temporarily switch docker-registry.wikimedia.org to use the main/master registry? The intent is to have the image available as soon as possible after they got published.

fsero added a comment.May 13 2019, 4:29 PM

@hashar the CR is already there https://gerrit.wikimedia.org/r/c/operations/puppet/+/509879 just need a +1 from Traffic and i´ll merge it

I can confirm that instances are now able to fetch new containers immediately after they have been published. So that solves it for me. Thank you for the very quick fix up!

fsero closed this task as Resolved.Jun 20 2019, 2:08 PM

new registry has been in production for some time without issues, there are some leftovers that need to be addressed i'll open subtasks for that.