Page MenuHomePhabricator

Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push
Closed, ResolvedPublic

Description

(edit: was "WMF container registry does not accept a manifest list (aka OCI manifest index, or "fat" manifest" but retitled as this seems to be a condition on the client side)


When attempting to publish a multi-platform image to the WMF container registry, buildctl/BuildKit fails with: error: failed to solve: failed commit on ref "manifest-sha256:b1f76c874d0417382df17036e0f8a732281762fb85694d02378e3a175d1c103f": cannot reuse body, request must be retried

See https://gitlab.wikimedia.org/repos/releng/blubber/-/jobs/28242

...
#27 exporting to image
#27 exporting layers
#27 exporting layers 0.9s done
#27 exporting manifest sha256:b1f76c874d0417382df17036e0f8a732281762fb85694d02378e3a175d1c103f done
#27 exporting config sha256:268701062a811129e68e6d0f9a53bdd64fea1e8e083a955f4cfc98aacd591322 done
#27 exporting manifest sha256:76c748e512209fca21610a44789f4911e8ac6526a93210678a44398c9fb28041 done
#27 exporting config sha256:81c11b26f86c1d88d784580e9d7567714347712696a637d4066ab9e63bdb9ca0 done
#27 exporting manifest list sha256:9828d7ddccdf65bde8b25d05794309794c2c3b87d185fe0d00ff4824ef47f503 done
#27 pushing layers
#27 pushing layers 2.2s done
#27 pushing manifest for docker-registry.discovery.wmnet/repos/releng/blubber/buildkit:v0.14.0@sha256:9828d7ddccdf65bde8b25d05794309794c2c3b87d185fe0d00ff4824ef47f503
#27 pushing manifest for docker-registry.discovery.wmnet/repos/releng/blubber/buildkit:v0.14.0@sha256:9828d7ddccdf65bde8b25d05794309794c2c3b87d185fe0d00ff4824ef47f503 0.3s done
#27 ERROR: failed commit on ref "manifest-sha256:b1f76c874d0417382df17036e0f8a732281762fb85694d02378e3a175d1c103f": cannot reuse body, request must be retried
------
 > exporting to image:
------
error: failed to solve: failed commit on ref "manifest-sha256:b1f76c874d0417382df17036e0f8a732281762fb85694d02378e3a175d1c103f": cannot reuse body, request must be retried

This may be related to an upstream bug in containerd. Note that we're using OCI workers for building, not containerd. However, containerd is used to push.

There seems to have been a fix merged as well, so I'll see about getting this into our BuildKit build.

Event Timeline

I think this might require an update of our docker-registry (which we have not planned for currently). Is this blocking something apart from testing with multi arch images (T272500)?

I think this might require an update of our docker-registry (which we have not planned for currently). Is this blocking something apart from testing with multi arch images (T272500)?

It's possible this blocks T321316: Self-build and publish buildkit helper images as the dockerfile-copy image we're currently trying to vendor/rebuild is a multi-platform image.

Do you see anything in the registry access logs from gitlab-runner1004.eqiad.wmnet around 2022-11-04T22:43:00Z?

I took a quick look and AIUI our registry does support application/vnd.docker.distribution.manifest.list.v2+json already. That is really not a new thing and we are not that far behind upstream releases.

@hashar triggered a retry of said job for me (thanks!) with only one registry pooled to rule out strange caching behavior, these are the logs we got. Nothing jumping my eye apart from the fact that I don't see the client trying to PUT the manifest in question:

Nov  8 09:39:25 registry2003 docker-registry[7573]: time="2022-11-08T09:39:25.27595589Z" level=info msg="response completed" go.version=go1.11.4 http.request.host=docker-registry.wikimedia.org http.request.id=8e8f0f24-b179-4043-8cc9-880a4298a988 http.request.method=GET http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/repos/releng/blubber/buildkit/manifests/v0.13.1" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.response.duration=160.805238ms http.response.status=200 http.response.written=739                                 
Nov  8 09:39:25 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:39:25 +0000] "GET /v2/repos/releng/blubber/buildkit/manifests/v0.13.1 HTTP/1.1" 200 739 "" "buildkit/v0.10-dev"                                                                                                                      
Nov  8 09:39:25 registry2003 docker-registry[7573]: time="2022-11-08T09:39:25.409484236Z" level=info msg="response completed" go.version=go1.11.4 http.request.host=registry http.request.id=40b2e36b-84be-4827-8685-39e0ca4422a0 http.request.method=GET http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/repos/releng/blubber/buildkit/manifests/sha256:2c1df3c60855e46964059eeaa48ecb498427e8f1f277f91f2983bf609e479d67" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.response.duration=97.259281ms http.response.status=200 http.response.written=739
Nov  8 09:39:25 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:39:25 +0000] "GET /v2/repos/releng/blubber/buildkit/manifests/sha256:2c1df3c60855e46964059eeaa48ecb498427e8f1f277f91f2983bf609e479d67 HTTP/1.1" 200 739 "" "buildkit/v0.10-dev"                                                      
Nov  8 09:39:25 registry2003 docker-registry[7573]: time="2022-11-08T09:39:25.592261668Z" level=info msg="response completed" go.version=go1.11.4 http.request.host=registry http.request.id=0e9d542f-47d4-4d67-ad9a-30839f2ce64e http.request.method=GET http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/sha256:adada3f34e5ac246bfc1947e147a235ecd258627b20e27888ec957f39c2dcc7e" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/octet-stream" http.response.duration=145.370514ms http.response.status=206 http.response.written=1766                    
Nov  8 09:39:25 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:39:25 +0000] "GET /v2/repos/releng/blubber/buildkit/blobs/sha256:adada3f34e5ac246bfc1947e147a235ecd258627b20e27888ec957f39c2dcc7e HTTP/1.1" 206 1766 "" "buildkit/v0.10-dev"                                                         
Nov  8 09:39:26 registry2003 docker-registry[7573]: time="2022-11-08T09:39:26.805306103Z" level=info msg="response completed" go.version=go1.11.4 http.request.host=docker-registry.wikimedia.org http.request.id=f88c12b3-a049-438d-84ae-12d55a9a035c http.request.method=GET http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/golang1.17/manifests/1.17-1" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.response.duration=143.198981ms http.response.status=200 http.response.written=1156                                                   
Nov  8 09:39:26 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:39:26 +0000] "GET /v2/golang1.17/manifests/1.17-1 HTTP/1.1" 200 1156 "" "buildkit/v0.10-dev"                                                                                                                                         
Nov  8 09:39:26 registry2003 docker-registry[7573]: time="2022-11-08T09:39:26.916429734Z" level=info msg="response completed" go.version=go1.11.4 http.request.host=registry http.request.id=19acd260-7439-46e1-9884-416adc145677 http.request.method=GET http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/golang1.17/manifests/sha256:7dc97c16237f2265ed6932e50ea50da5b2a9e39c5b094efb04394ca3f0acc72d" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.response.duration=80.839548ms http.response.status=200 http.response.written=1156        
Nov  8 09:39:26 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:39:26 +0000] "GET /v2/golang1.17/manifests/sha256:7dc97c16237f2265ed6932e50ea50da5b2a9e39c5b094efb04394ca3f0acc72d HTTP/1.1" 200 1156 "" "buildkit/v0.10-dev"                                                                        
Nov  8 09:39:26 registry2003 docker-registry[7573]: time="2022-11-08T09:39:26.949655822Z" level=info msg="response completed" go.version=go1.11.4 http.request.host=docker-registry.wikimedia.org http.request.id=ccf9671f-efdc-400d-86cc-16f03bd04894 http.request.method=HEAD http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/golang1.17/manifests/1.17-1" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.response.duration=137.910554ms http.response.status=200 http.response.written=1156                                                  
Nov  8 09:39:26 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:39:26 +0000] "HEAD /v2/golang1.17/manifests/1.17-1 HTTP/1.1" 200 1156 "" "buildkit/v0.10-dev"                                                                                                                                        
Nov  8 09:39:27 registry2003 docker-registry[7573]: time="2022-11-08T09:39:27.051731658Z" level=info msg="response completed" go.version=go1.11.4 http.request.host=registry http.request.id=fa8b407c-4a2f-4de3-a96c-7d0276574296 http.request.method=GET http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/golang1.17/blobs/sha256:81c44e9d392aff45d5a10dd4535a0c94f66bac68c83d207925ad8dd17002cdcd" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/octet-stream" http.response.duration=120.188232ms http.response.status=206 http.response.written=3826                                       
Nov  8 09:39:27 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:39:26 +0000] "GET /v2/golang1.17/blobs/sha256:81c44e9d392aff45d5a10dd4535a0c94f66bac68c83d207925ad8dd17002cdcd HTTP/1.1" 206 3826 "" "buildkit/v0.10-dev"                                                                            
Nov  8 09:40:38 registry2003 docker-registry[7573]: time="2022-11-08T09:40:38.231877138Z" level=error msg="response completed with error" err.code="blob unknown" err.detail=sha256:2304af3f0bad73d77281d9ed64450410263f761b2d537ace4a1a8de8612789a3 err.message="blob unknown to registry" go.version=go1.11.4 http.request.host=registry http.request.id=cd2b75f5-b6ad-4fe2-a6ba-e6ae875dba88 http.request.method=GET http.request.remoteaddr="127.0.0.1:60074" http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/sha256:2304af3f0bad73d77281d9ed64450410263f761b2d537ace4a1a8de8612789a3" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/json; charset=utf-8" http.response.duration=61.484864ms http.response.status=404 http.response.written=157 vars.digest="sha256:2304af3f0bad73d77281d9ed64450410263f761b2d537ace4a1a8de8612789a3" vars.name="repos/releng/blubber/buildkit"                                     
Nov  8 09:40:38 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "GET /v2/repos/releng/blubber/buildkit/blobs/sha256:2304af3f0bad73d77281d9ed64450410263f761b2d537ace4a1a8de8612789a3 HTTP/1.1" 404 157 "" "buildkit/v0.10-dev"                                                          
Nov  8 09:40:38 registry2003 docker-registry[7573]: time="2022-11-08T09:40:38.240444594Z" level=error msg="response completed with error" err.code="blob unknown" err.detail=sha256:86d5ff669d1b2bfbdb7bd70679607a40ba519a1bb929bc4669e7d14f602932da err.message="blob unknown to registry" go.version=go1.11.4 http.request.host=registry http.request.id=eef0d217-f3cc-4612-bf5e-6e8d494ed5de http.request.method=GET http.request.remoteaddr="127.0.0.1:60072" http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/sha256:86d5ff669d1b2bfbdb7bd70679607a40ba519a1bb929bc4669e7d14f602932da" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/json; charset=utf-8" http.response.duration=67.004066ms http.response.status=404 http.response.written=157 vars.digest="sha256:86d5ff669d1b2bfbdb7bd70679607a40ba519a1bb929bc4669e7d14f602932da" vars.name="repos/releng/blubber/buildkit"                                     
Nov  8 09:40:38 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "GET /v2/repos/releng/blubber/buildkit/blobs/sha256:86d5ff669d1b2bfbdb7bd70679607a40ba519a1bb929bc4669e7d14f602932da HTTP/1.1" 404 157 "" "buildkit/v0.10-dev"                                                          
Nov  8 09:40:38 registry2003 docker-registry[7573]: time="2022-11-08T09:40:38.269692925Z" level=error msg="response completed with error" err.code="blob unknown" err.detail=sha256:d84b41282ac2d330e83207439b63e129bf74fd652ba7a978fbc88e449cef1347 err.message="blob unknown to registry" go.version=go1.11.4 http.request.host=registry http.request.id=73195c8d-1e3f-40e2-9c7e-896295992576 http.request.method=GET http.request.remoteaddr="127.0.0.1:60070" http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/sha256:d84b41282ac2d330e83207439b63e129bf74fd652ba7a978fbc88e449cef1347" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/json; charset=utf-8" http.response.duration=99.538372ms http.response.status=404 http.response.written=157 vars.digest="sha256:d84b41282ac2d330e83207439b63e129bf74fd652ba7a978fbc88e449cef1347" vars.name="repos/releng/blubber/buildkit"                                     
Nov  8 09:40:38 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "GET /v2/repos/releng/blubber/buildkit/blobs/sha256:d84b41282ac2d330e83207439b63e129bf74fd652ba7a978fbc88e449cef1347 HTTP/1.1" 404 157 "" "buildkit/v0.10-dev"                                                          
Nov  8 09:40:38 registry2003 docker-registry[7573]: time="2022-11-08T09:40:38.315219237Z" level=error msg="response completed with error" err.code="blob unknown" err.detail=sha256:57b6a04c9a60fedb52b13d1a3f14c9f794ecd1545243727feeeb72503b1d4d85 err.message="blob unknown to registry" go.version=go1.11.4 http.request.host=registry http.request.id=b43971ee-691d-4983-93c1-cfee2b705461 http.request.method=GET http.request.remoteaddr="127.0.0.1:60076" http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/sha256:57b6a04c9a60fedb52b13d1a3f14c9f794ecd1545243727feeeb72503b1d4d85" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/json; charset=utf-8" http.response.duration=139.940917ms http.response.status=404 http.response.written=157 vars.digest="sha256:57b6a04c9a60fedb52b13d1a3f14c9f794ecd1545243727feeeb72503b1d4d85" vars.name="repos/releng/blubber/buildkit"                                    
Nov  8 09:40:38 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "GET /v2/repos/releng/blubber/buildkit/blobs/sha256:57b6a04c9a60fedb52b13d1a3f14c9f794ecd1545243727feeeb72503b1d4d85 HTTP/1.1" 404 157 "" "buildkit/v0.10-dev"                                                          
Nov  8 09:40:38 registry2003 docker-registry[7573]: time="2022-11-08T09:40:38.612707987Z" level=info msg="response completed" go.version=go1.11.4 http.request.host=docker-registry.discovery.wmnet http.request.id=00660bd1-3abd-4752-841e-6f465c1d8ea3 http.request.method=POST http.request.remoteaddr=10.192.16.161
http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/uploads/" http.request.useragent="buildkit/v0.10-dev" http.response.duration=342.488149ms http.response.status=202 http.response.written=0                                                                                                                   
Nov  8 09:40:38 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "POST /v2/repos/releng/blubber/buildkit/blobs/uploads/ HTTP/1.1" 202 0 "" "buildkit/v0.10-dev"                                                                                                                          
Nov  8 09:40:38 registry2003 docker-registry[7573]: time="2022-11-08T09:40:38.638296026Z" level=info msg="response completed" go.version=go1.11.4 http.request.host=docker-registry.discovery.wmnet http.request.id=3926a9f5-3c75-4cb7-8eb1-3a7fd70fa3d9 http.request.method=POST http.request.remoteaddr=10.192.16.161
http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/uploads/" http.request.useragent="buildkit/v0.10-dev" http.response.duration=368.471841ms http.response.status=202 http.response.written=0                                                                                                                   
Nov  8 09:40:38 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "POST /v2/repos/releng/blubber/buildkit/blobs/uploads/ HTTP/1.1" 202 0 "" "buildkit/v0.10-dev"                                                                                                                          
Nov  8 09:40:38 registry2003 docker-registry[7573]: time="2022-11-08T09:40:38.743152787Z" level=info msg="response completed" go.version=go1.11.4 http.request.host=docker-registry.discovery.wmnet http.request.id=7268c4a0-eaef-42b6-ac18-8ba2886fe9f4 http.request.method=POST http.request.remoteaddr=10.192.16.161
http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/uploads/" http.request.useragent="buildkit/v0.10-dev" http.response.duration=466.755682ms http.response.status=202 http.response.written=0                                                                                                                   
Nov  8 09:40:38 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "POST /v2/repos/releng/blubber/buildkit/blobs/uploads/ HTTP/1.1" 202 0 "" "buildkit/v0.10-dev"                                                                                                                          
Nov  8 09:40:38 registry2003 docker-registry[7573]: time="2022-11-08T09:40:38.75960913Z" level=info msg="response completed" go.version=go1.11.4 http.request.host=docker-registry.discovery.wmnet http.request.id=0c34ffaf-e113-4da3-995c-5b75ecda14ec http.request.method=POST http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/uploads/" http.request.useragent="buildkit/v0.10-dev" http.response.duration=441.781431ms http.response.status=202 http.response.written=0                                                                                                                    
Nov  8 09:40:38 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "POST /v2/repos/releng/blubber/buildkit/blobs/uploads/ HTTP/1.1" 202 0 "" "buildkit/v0.10-dev"                                                                                                                          
Nov  8 09:40:39 registry2003 docker-registry[7573]: time="2022-11-08T09:40:39.866075472Z" level=info msg="response completed" go.version=go1.11.4 http.request.contenttype="application/octet-stream" http.request.host=docker-registry.discovery.wmnet http.request.id=535c1875-0732-45af-9767-8495998604f2 http.request.method=PUT http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/uploads/0a3bf978-932e-4921-b998-e79ce3dc925e?_state=x2CSt0fWRDC3bXJvoWzdbFg2ol7__8uguY0GnjJDxid7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiIwYTNiZjk3OC05MzJlLTQ5MjEtYjk5OC1lNzljZTNkYzkyNWUiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMzIzMjIwNzk2WiJ9&digest=sha256%3A57b6a04c9a60fedb52b13d1a3f14c9f794ecd1545243727feeeb72503b1d4d85" http.request.useragent="buildkit/v0.10-dev" http.response.duration=1.096871397s http.response.status=201 http.response.written=0                   
Nov  8 09:40:39 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "PUT /v2/repos/releng/blubber/buildkit/blobs/uploads/0a3bf978-932e-4921-b998-e79ce3dc925e?_state=x2CSt0fWRDC3bXJvoWzdbFg2ol7__8uguY0GnjJDxid7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiIwYTNiZjk3OC05MzJlLTQ5MjEtYjk5OC1lNzljZTNkYzkyNWUiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMzIzMjIwNzk2WiJ9&digest=sha256%3A57b6a04c9a60fedb52b13d1a3f14c9f794ecd1545243727feeeb72503b1d4d85 HTTP/1.1" 201 0 "" "buildkit/v0.10-dev"                                                                              
Nov  8 09:40:39 registry2003 docker-registry[7573]: time="2022-11-08T09:40:39.879147328Z" level=info msg="response completed" go.version=go1.11.4 http.request.contenttype="application/octet-stream" http.request.host=docker-registry.discovery.wmnet http.request.id=55b9a3d4-8311-4b04-812e-c1f853d3a944 http.request.method=PUT http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/uploads/d652108d-b99b-4248-93f8-d558362eff00?_state=Fs2MZ8QOf-P_HSfv0f2S5JK7Cw-D4wsNVACaeyblfYh7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiJkNjUyMTA4ZC1iOTliLTQyNDgtOTNmOC1kNTU4MzYyZWZmMDAiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMjc4NjYwMzg3WiJ9&digest=sha256%3Ad84b41282ac2d330e83207439b63e129bf74fd652ba7a978fbc88e449cef1347" http.request.useragent="buildkit/v0.10-dev" http.response.duration=1.129097238s http.response.status=201 http.response.written=0                   
Nov  8 09:40:39 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "PUT /v2/repos/releng/blubber/buildkit/blobs/uploads/d652108d-b99b-4248-93f8-d558362eff00?_state=Fs2MZ8QOf-P_HSfv0f2S5JK7Cw-D4wsNVACaeyblfYh7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiJkNjUyMTA4ZC1iOTliLTQyNDgtOTNmOC1kNTU4MzYyZWZmMDAiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMjc4NjYwMzg3WiJ9&digest=sha256%3Ad84b41282ac2d330e83207439b63e129bf74fd652ba7a978fbc88e449cef1347 HTTP/1.1" 201 0 "" "buildkit/v0.10-dev"                                                                              
Nov  8 09:40:39 registry2003 docker-registry[7573]: time="2022-11-08T09:40:39.909519791Z" level=info msg="response completed" go.version=go1.11.4 http.request.contenttype="application/octet-stream" http.request.host=docker-registry.discovery.wmnet http.request.id=6c3efc92-0590-41e8-8e69-ace8424444ed http.request.method=PUT http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/uploads/2070e40f-1cf5-4f32-8c80-1415a5718504?_state=-uDMTPszplKPFdzI6h6_Xyy1MOMZHnNns7-ygQlclyx7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiIyMDcwZTQwZi0xY2Y1LTRmMzItOGM4MC0xNDE1YTU3MTg1MDQiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMjczMjUyMzA1WiJ9&digest=sha256%3A86d5ff669d1b2bfbdb7bd70679607a40ba519a1bb929bc4669e7d14f602932da" http.request.useragent="buildkit/v0.10-dev" http.response.duration=1.254979655s http.response.status=201 http.response.written=0                   
Nov  8 09:40:39 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "PUT /v2/repos/releng/blubber/buildkit/blobs/uploads/2070e40f-1cf5-4f32-8c80-1415a5718504?_state=-uDMTPszplKPFdzI6h6_Xyy1MOMZHnNns7-ygQlclyx7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiIyMDcwZTQwZi0xY2Y1LTRmMzItOGM4MC0xNDE1YTU3MTg1MDQiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMjczMjUyMzA1WiJ9&digest=sha256%3A86d5ff669d1b2bfbdb7bd70679607a40ba519a1bb929bc4669e7d14f602932da HTTP/1.1" 201 0 "" "buildkit/v0.10-dev"                                                                              
Nov  8 09:40:39 registry2003 docker-registry[7573]: time="2022-11-08T09:40:39.941475388Z" level=info msg="response completed" go.version=go1.11.4 http.request.contenttype="application/octet-stream" http.request.host=docker-registry.discovery.wmnet http.request.id=eca13619-25df-4542-bbea-69c1a68aa399 http.request.method=PUT http.request.remoteaddr=10.192.16.161 http.request.uri="/v2/repos/releng/blubber/buildkit/blobs/uploads/693a526a-54af-4698-bec0-7fad9c076d73?_state=qo-sXaj7igi4KQ_5V6XispGka_FDNur_csKsYwUhq4h7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiI2OTNhNTI2YS01NGFmLTQ2OTgtYmVjMC03ZmFkOWMwNzZkNzMiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMjczMTM3MDcxWiJ9&digest=sha256%3A2304af3f0bad73d77281d9ed64450410263f761b2d537ace4a1a8de8612789a3" http.request.useragent="buildkit/v0.10-dev" http.response.duration=1.233868494s http.response.status=201 http.response.written=0                   
Nov  8 09:40:39 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:38 +0000] "PUT /v2/repos/releng/blubber/buildkit/blobs/uploads/693a526a-54af-4698-bec0-7fad9c076d73?_state=qo-sXaj7igi4KQ_5V6XispGka_FDNur_csKsYwUhq4h7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiI2OTNhNTI2YS01NGFmLTQ2OTgtYmVjMC03ZmFkOWMwNzZkNzMiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMjczMTM3MDcxWiJ9&digest=sha256%3A2304af3f0bad73d77281d9ed64450410263f761b2d537ace4a1a8de8612789a3 HTTP/1.1" 201 0 "" "buildkit/v0.10-dev"                                                                              
Nov  8 09:40:40 registry2003 docker-registry[7573]: time="2022-11-08T09:40:40.093903898Z" level=error msg="response completed with error" err.code="manifest unknown" err.detail="unknown manifest name=repos/releng/blubber/buildkit revision=sha256:24b07abcb824307bb4d16047a850319b38e3555af08377cedc51e1407cb180d5"
err.message="manifest unknown" go.version=go1.11.4 http.request.host=registry http.request.id=ef843c3e-afcb-4723-8a5c-aedf6571a1ad http.request.method=GET http.request.remoteaddr="127.0.0.1:60104" http.request.uri="/v2/repos/releng/blubber/buildkit/manifests/sha256:24b07abcb824307bb4d16047a850319b38e3555af08377cedc51e1407cb180d5" http.request.useragent="buildkit/v0.10-dev" http.response.contenttype="application/json; charset=utf-8" http.response.duration=143.583916ms http.response.status=404 http.response.written=205 vars.name="repos/releng/blubber/buildkit" vars.reference="sha256:24b07abcb824307bb4d16047a850319b38e3555af08377cedc51e1407cb180d5"
Nov  8 09:40:40 registry2003 docker-registry[7573]: 127.0.0.1 - - [08/Nov/2022:09:40:39 +0000] "GET /v2/repos/releng/blubber/buildkit/manifests/sha256:24b07abcb824307bb4d16047a850319b38e3555af08377cedc51e1407cb180d5 HTTP/1.1" 404 205 "" "buildkit/v0.10-dev"

(linking to T320811)

Thanks for debugging this further, @JMeybohm and @hashar. In your recent re-run of the job it seems the manifest list is accepted by the registry, and it's the subsequent manifest push that fails, so you're right this doesn't seem to be related to a lack of manifest list support for perhaps some errant behavior on buildkit's side that manifests under this multi-platform push condition. I'll re-word and triage accordingly.

27 exporting manifest sha256:24b07abcb824307bb4d16047a850319b38e3555af08377cedc51e1407cb180d5 done
#27 exporting config sha256:d84b41282ac2d330e83207439b63e129bf74fd652ba7a978fbc88e449cef1347 done
#27 exporting manifest sha256:339e830b18379f74bf67d6257a0c346327daa931e0c5a752d2998de3a4df6ba6 done
#27 exporting config sha256:57b6a04c9a60fedb52b13d1a3f14c9f794ecd1545243727feeeb72503b1d4d85 done
#27 exporting manifest list sha256:1134f868ab106022c275b5d7c0034c7e2d7c6bbdea292f726d2b673655483be1 done
#27 pushing layers
#27 pushing layers 1.8s done
#27 pushing manifest for docker-registry.discovery.wmnet/repos/releng/blubber/buildkit:v0.14.0@sha256:1134f868ab106022c275b5d7c0034c7e2d7c6bbdea292f726d2b673655483be1
time="2022-11-08T09:40:40Z" level=debug msg="stopping session" spanID=3f66ceb23362436e traceID=c2a85367fe1b636fd5258357d4255842
#27 pushing manifest for docker-registry.discovery.wmnet/repos/releng/blubber/buildkit:v0.14.0@sha256:1134f868ab106022c275b5d7c0034c7e2d7c6bbdea292f726d2b673655483be1 0.2s done
#27 ERROR: failed commit on ref "manifest-sha256:24b07abcb824307bb4d16047a850319b38e3555af08377cedc51e1407cb180d5": cannot reuse body, request must be retried

It looks like is/was a bug in containerd that may be the culprit. I'll have to debug further.

dduvall renamed this task from WMF container registry does not accept a manifest list (aka OCI manifest index, or "fat" manifest) to Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push.Nov 8 2022, 5:27 PM
dduvall updated the task description. (Show Details)

@JMeybohm can you provide the nginx access log entries from that time period as well? I'm trying to rule out auth failure as a factor and docker-registry log entries do not include the subrequests between nginx and jwt-authorizer.

Mentioned in SAL (#wikimedia-releng) [2022-11-08T20:01:22Z] <dduvall> temporarily enabling buildkitd debug logging on gitlab-runner hosts (T322453)

I enabled debug logging for buildkitd on the gitlab-runner hosts and re-ran the failed job. It appears that auth failure may be the root cause—the client is definitely misbehaving but the there should not be auth failure here. It's unclear why.

Nov 08 20:06:28 gitlab-runner1002 docker[1075791]: time="2022-11-08T20:06:28Z" level=debug msg="fetch response received" digest="sha256:86d64a41a479ea8910e7d5b7dc6d2a89311079e408d7e0007ba1b580a29dbb75" mediatype=application/vnd.docker.distribution.manifest.v2+json response.header.connection=keep-alive response.header.content-length=205 response.header.content-type="application/json; charset=utf-8" response.header.date="Tue, 08 Nov 2022 20:06:28 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.server=nginx/1.14.2 response.header.x-cache-status=MISS response.header.x-content-type-options=nosniff response.status="404 Not Found" size=739 url="https://docker-registry.discovery.wmnet/v2/repos/releng/blubber/buildkit/manifests/sha256:86d64a41a479ea8910e7d5b7dc6d2a89311079e408d7e0007ba1b580a29dbb75"
Nov 08 20:06:28 gitlab-runner1002 docker[1075791]: time="2022-11-08T20:06:28Z" level=debug msg="do request" digest="sha256:86d64a41a479ea8910e7d5b7dc6d2a89311079e408d7e0007ba1b580a29dbb75" mediatype=application/vnd.docker.distribution.manifest.v2+json request.header.content-type=application/vnd.docker.distribution.manifest.v2+json request.header.user-agent=buildkit/v0.10-dev request.method=PUT size=739 url="https://docker-registry.discovery.wmnet/v2/repos/releng/blubber/buildkit/manifests/sha256:86d64a41a479ea8910e7d5b7dc6d2a89311079e408d7e0007ba1b580a29dbb75"
Nov 08 20:06:28 gitlab-runner1002 docker[1075791]: time="2022-11-08T20:06:28Z" level=debug msg="fetch response received" digest="sha256:86d64a41a479ea8910e7d5b7dc6d2a89311079e408d7e0007ba1b580a29dbb75" mediatype=application/vnd.docker.distribution.manifest.v2+json response.header.connection=keep-alive response.header.content-length=195 response.header.content-type=text/html response.header.date="Tue, 08 Nov 2022 20:06:28 GMT" response.header.server=nginx/1.14.2 response.header.www-authenticate="Bearer realm=\"https://gitlab.wikimedia.org/jwt/auth\", service=\"jwt_authorizer\", scope=\"repository::push,pull\"" response.status="401 Unauthorized" size=739 url="https://docker-registry.discovery.wmnet/v2/repos/releng/blubber/buildkit/manifests/sha256:86d64a41a479ea8910e7d5b7dc6d2a89311079e408d7e0007ba1b580a29dbb75"
Nov 08 20:06:28 gitlab-runner1002 docker[1075791]: time="2022-11-08T20:06:28Z" level=debug msg=Unauthorized digest="sha256:86d64a41a479ea8910e7d5b7dc6d2a89311079e408d7e0007ba1b580a29dbb75" header="Bearer realm=\"https://gitlab.wikimedia.org/jwt/auth\", service=\"jwt_authorizer\", scope=\"repository::push,pull\"" mediatype=application/vnd.docker.distribution.manifest.v2+json size=739
Nov 08 20:06:28 gitlab-runner1002 docker[1075791]: time="2022-11-08T20:06:28Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = failed commit on ref \"manifest-sha256:86d64a41a479ea8910e7d5b7dc6d2a89311079e408d7e0007ba1b580a29dbb75\": cannot reuse body, request must be retried\n"

I recently modified to improve the validation error reporting, specifically to include the validation error response in the response. I'll see if I can get an updated version of that deployed to the registry hosts.

Mentioned in SAL (#wikimedia-releng) [2022-11-08T20:17:01Z] <dduvall> puppet re-enabled on gitlab-runner hosts (T322453) normal log level will be restored on next puppet run

@JMeybohm can you provide the nginx access log entries from that time period as well? I'm trying to rule out auth failure as a factor and docker-registry log entries do not include the subrequests between nginx and jwt-authorizer.

It looks as if nginx logs confirm your theory:

10.192.16.161 - - [08/Nov/2022:09:40:38 +0000] "HEAD /v2/repos/releng/blubber/buildkit/blobs/sha256:2304af3f0bad73d77281d9ed64450410263f761b2d537ace4a1a8de8612789a3 HTTP/1.1" 404 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:38 +0000] "HEAD /v2/repos/releng/blubber/buildkit/blobs/sha256:86d5ff669d1b2bfbdb7bd70679607a40ba519a1bb929bc4669e7d14f602932da HTTP/1.1" 404 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:38 +0000] "POST /v2/repos/releng/blubber/buildkit/blobs/uploads/ HTTP/1.1" 401 195 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:38 +0000] "POST /v2/repos/releng/blubber/buildkit/blobs/uploads/ HTTP/1.1" 401 195 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:38 +0000] "HEAD /v2/repos/releng/blubber/buildkit/blobs/sha256:d84b41282ac2d330e83207439b63e129bf74fd652ba7a978fbc88e449cef1347 HTTP/1.1" 404 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:38 +0000] "HEAD /v2/repos/releng/blubber/buildkit/blobs/sha256:57b6a04c9a60fedb52b13d1a3f14c9f794ecd1545243727feeeb72503b1d4d85 HTTP/1.1" 404 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:38 +0000] "POST /v2/repos/releng/blubber/buildkit/blobs/uploads/ HTTP/1.1" 202 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:38 +0000] "POST /v2/repos/releng/blubber/buildkit/blobs/uploads/ HTTP/1.1" 202 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:38 +0000] "POST /v2/repos/releng/blubber/buildkit/blobs/uploads/ HTTP/1.1" 202 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:38 +0000] "POST /v2/repos/releng/blubber/buildkit/blobs/uploads/ HTTP/1.1" 202 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:39 +0000] "PUT /v2/repos/releng/blubber/buildkit/blobs/uploads/0a3bf978-932e-4921-b998-e79ce3dc925e?_state=x2CSt0fWRDC3bXJvoWzdbFg2ol7__8uguY0GnjJDxid7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiIwYTNiZjk3OC05MzJlLTQ5MjEtYjk5OC1lNzljZTNkYzkyNWUiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMzIzMjIwNzk2WiJ9&digest=sha256%3A57b6a04c9a60fedb52b13d1a3f14c9f794ecd1545243727feeeb72503b1d4d85 HTTP/1.1" 201 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:39 +0000] "PUT /v2/repos/releng/blubber/buildkit/blobs/uploads/d652108d-b99b-4248-93f8-d558362eff00?_state=Fs2MZ8QOf-P_HSfv0f2S5JK7Cw-D4wsNVACaeyblfYh7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiJkNjUyMTA4ZC1iOTliLTQyNDgtOTNmOC1kNTU4MzYyZWZmMDAiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMjc4NjYwMzg3WiJ9&digest=sha256%3Ad84b41282ac2d330e83207439b63e129bf74fd652ba7a978fbc88e449cef1347 HTTP/1.1" 201 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:39 +0000] "PUT /v2/repos/releng/blubber/buildkit/blobs/uploads/2070e40f-1cf5-4f32-8c80-1415a5718504?_state=-uDMTPszplKPFdzI6h6_Xyy1MOMZHnNns7-ygQlclyx7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiIyMDcwZTQwZi0xY2Y1LTRmMzItOGM4MC0xNDE1YTU3MTg1MDQiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMjczMjUyMzA1WiJ9&digest=sha256%3A86d5ff669d1b2bfbdb7bd70679607a40ba519a1bb929bc4669e7d14f602932da HTTP/1.1" 201 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:39 +0000] "PUT /v2/repos/releng/blubber/buildkit/blobs/uploads/693a526a-54af-4698-bec0-7fad9c076d73?_state=qo-sXaj7igi4KQ_5V6XispGka_FDNur_csKsYwUhq4h7Ik5hbWUiOiJyZXBvcy9yZWxlbmcvYmx1YmJlci9idWlsZGtpdCIsIlVVSUQiOiI2OTNhNTI2YS01NGFmLTQ2OTgtYmVjMC03ZmFkOWMwNzZkNzMiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMTEtMDhUMDk6NDA6MzguMjczMTM3MDcxWiJ9&digest=sha256%3A2304af3f0bad73d77281d9ed64450410263f761b2d537ace4a1a8de8612789a3 HTTP/1.1" 201 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:40 +0000] "HEAD /v2/repos/releng/blubber/buildkit/manifests/sha256:24b07abcb824307bb4d16047a850319b38e3555af08377cedc51e1407cb180d5 HTTP/1.1" 404 0 "-" "buildkit/v0.10-dev"
10.192.16.161 - - [08/Nov/2022:09:40:40 +0000] "PUT /v2/repos/releng/blubber/buildkit/manifests/sha256:24b07abcb824307bb4d16047a850319b38e3555af08377cedc51e1407cb180d5 HTTP/1.1" 401 195 "-" "buildkit/v0.10-dev"

jwt-authorizer has been upgraded to 1.1.0 on the 4 registry hosts. Please try again.

I'm finally circling back to this, and I ran another test yesterday. The additional logging in jwt-authorizer confirms that this is an auth failure due to the token scope not matching the request URL. However, the reason for this mismatch is strange. (See below.)

Mar 21 22:11:59 gitlab-runner2002 docker[2112472]: time="2023-03-21T22:11:59Z" level=debug msg="do request" digest="sha256:7843dd993714dbd0f255f40aaecbeeaac85c86c1616956c6d7e419fe8169358d" mediatype=application/vnd.docker.distribution.manifest.v2+json request.header.content-type=application/vnd.docker.distribution.manifest.v2+json request.header.user-agent=buildkit/v0.10-dev request.method=PUT size=739 url="https://docker-registry.discovery.wmnet/v2/repos/releng/blubber/buildkit/manifests/sha256:7843dd993714dbd0f255f40aaecbeeaac85c86c1616956c6d7e419fe8169358d"
Mar 21 22:11:59 gitlab-runner2002 docker[2112472]: time="2023-03-21T22:11:59Z" level=debug msg="fetch response received" digest="sha256:7843dd993714dbd0f255f40aaecbeeaac85c86c1616956c6d7e419fe8169358d" mediatype=application/vnd.docker.distribution.manifest.v2+json response.header.connection=keep-alive response.header.content-length=195 response.header.content-type=text/html response.header.date="Tue, 21 Mar 2023 22:11:59 GMT" response.header.server=nginx/1.14.2 response.header.www-authenticate="Bearer realm=\"https://gitlab.wikimedia.org/jwt/auth\", service=\"jwt_authorizer\", scope=\"repository::push,pull\", error=\"invalid_token\", error_description=\"token is invalid: Image name is not equal to or beneath the project path repos/releng/blubber.\"" response.status="401 Unauthorized" size=739 url="https://docker-registry.discovery.wmnet/v2/repos/releng/blubber/buildkit/manifests/sha256:7843dd993714dbd0f255f40aaecbeeaac85c86c1616956c6d7e419fe8169358d"
Mar 21 22:11:59 gitlab-runner2002 docker[2112472]: time="2023-03-21T22:11:59Z" level=debug msg=Unauthorized digest="sha256:7843dd993714dbd0f255f40aaecbeeaac85c86c1616956c6d7e419fe8169358d" header="Bearer realm=\"https://gitlab.wikimedia.org/jwt/auth\", service=\"jwt_authorizer\", scope=\"repository::push,pull\", error=\"invalid_token\", error_description=\"token is invalid: Image name is not equal to or beneath the project path repos/releng/blubber.\"" mediatype=application/vnd.docker.distribution.manifest.v2+json size=739
Mar 21 22:11:59 gitlab-runner2002 docker[2112472]: time="2023-03-21T22:11:59Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = failed commit on ref \"manifest-sha256:7843dd993714dbd0f255f40aaecbeeaac85c86c1616956c6d7e419fe8169358d\": cannot reuse body, request must be retried\n"
Mar 21 22:11:59 gitlab-runner2002 docker[2112472]: time="2023-03-21T22:11:59Z" level=debug msg="session finished: <nil>"

What I did notice this time around (which was in the previous logging but I had overlooked) is that the challenge scope from jwt-authorizer is repository::push,pull. This value is constructed by jwt-authorizer as "repository:" + req.URL.String()[1:] + ":push,pull" so the value we're seeing suggests that jwt-authorizer did not receive the original request URL for some reason. I'd love to verify this but we still don't have access to jwt-authorizer logging or the registry's nginx logs (see outstanding task T322579: give releng access to logs to debug buildkit-to-wmf-registry publishing).

Can someone please take a look at the logs for jwt-authorizer and nginx on the registry hosts and:

  1. See if there's an entry for jwt-authorizer that corresponds to the PUT request. It should look something like authorizing request: PUT <raw request URI>
  2. Tell us what the nginx access log entry looks like for the subrequest between nginx and jwt-authorizer (via auth_request).
  3. Confirm again what the original client request looks like in the nginx access log, though from the buildkitd logs and the prior entries it seems the request URL is as expected.

I'm suspecting some kind of misconfiguration in nginx that is losing the original request URL prior to the auth request but since this seems to only happens during multi-manifest pushes the behavior is strange indeed.

We don't have access to the actual requests obviously, but the logs I can provide in the hope that this helps:

zgrep -E '21/Mar/2023:22:1(1:5|2:0)' /var/log/nginx/access.log.2.gz

zgrep -E 'Mar 21 22:1(1:5|2:0)' /var/log/syslog.2.gz

Thank you, @JMeybohm. That's very helpful.

It does seem like the auth_request to jwt-authorizer is lacking the original URL for some reason:

10.192.16.161 - - [21/Mar/2023:22:11:59 +0000] "PUT /v2/repos/releng/blubber/buildkit/manifests/sha256:7843dd993714dbd0f255f40aaecbeeaac85c86c1616956c6d7e419fe8169358d HTTP/1.1" 401 195 "-" "buildkit/v0.10-dev"
Mar 21 22:11:59 registry2003 jwt-authorizer[2911]: 2023/03/21 22:11:59 authorizing request: PUT /
Mar 21 22:11:59 registry2003 jwt-authorizer[2911]: 2023/03/21 22:11:59 fetching JWKs...
Mar 21 22:11:59 registry2003 jwt-authorizer[2911]: 2023/03/21 22:11:59 validating JWT...

It would seem that there may be something wrong in the registry's nginx config around JWT auth, specifically the use of a location match and subsequent set $auth_request_path $1 that is later used in the auth subrequest's proxy_pass.

# Capture the original request path here so we can pass it to
# jwt-authorizer in /auth/jwt (the use of rewrite in /auth means we'd
# otherwise lose track of it)
location ~ ^/v2/(.*) {
  set $auth_request_path $1;
  # ...
}

# ...

location = /auth/jwt {
  internal;

  # Cache JWT authorization as well but be less aggressive about it, only
  # caching successful auths for 1 minute. Note that if the jwt-authorizer
  # is ever refactored to use more than the request path and authorization
  # header to make decisions, the cache key will need to be changed
  # accordingly.
  proxy_cache auth;
  proxy_cache_key "jwt $request_uri$http_authorization";
  proxy_cache_valid 200 202 1m;

  proxy_pass http://unix:<%= @jwt_authorizer_socket %>:/$auth_request_path;
  # ...
}

Maybe there's an nginx precedence gotcha here somewhere. I will try to repro it locally.

Change 904241 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/puppet@production] docker_registry_ha: Do not lose original request URI during JWT auth

https://gerrit.wikimedia.org/r/904241

I was able to reproduce the problem locally and it has to do with how nginx (doesn't) apply the set $auth_request_path to the nested location block that matches exact manifest/blob digests. From the commit message of the associated patch:

Nginx is quite strange in how it handles nested locations and which
directives from parent `location` blocks apply and which don't, and
there is nothing in the documentation for individual directives that
specifies when that is the case.

In this case, for example, consider this pared down part of the former
configuration.

    location ~ ^/v2/(.*) {
      set $auth_request_path $1;

      # ...

      auth_request /auth;
      auth_request_set $auth_status $upstream_status;

      proxy_pass http://registry;
      # ... (proxy options)

      location ~ ^/v2/(.*)/(blobs|manifests)/sha256:(.*)) {
        proxy_pass http://registry;
        include /etc/nginx/registry-nginx-cache.conf;
      }

One might assume that the `set $auth_request_path $1` at the top of the
parent location would apply to the nested location, since the other
directives like `auth_request` seem to (requests to exact digests are
being auth'd). However, the `set` does _not_ apply. Locally reproducing
this configuration and setting `error_log` to `debug` confirms this.

    2023/03/29 15:41:45 [warn] 27#27: *3 using uninitialized "auth_request_path" variable ...

See the patch for the workaround (essentially repeat the set in the nested location).

Change 904241 merged by Alexandros Kosiaris:

[operations/puppet@production] docker_registry_ha: Do not lose original request URI during JWT auth

https://gerrit.wikimedia.org/r/904241

dduvall claimed this task.
dduvall added a subscriber: akosiaris.

Confirmed that we can now push blubber's multi platform image to our registry. See https://gitlab.wikimedia.org/repos/releng/blubber/-/jobs/90258

$ docker manifest inspect docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v0.18.0
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 739,
         "digest": "sha256:1b0ba94136330a22a11bb6f2d8d0ddcbb8980e2f1bc91402ab341613fef43042",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 739,
         "digest": "sha256:970b4da449f18497220dc475ead152c4eac0de0a489dc461dccaba3fd92c3dbc",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      }
   ]
}

Thanks for the help all, and thanks for the review/merge @akosiaris !