I want to be able to see what images exist on the wikimedia docker registry, as well as what tags they have, when they were added etc.
Currently HTTP requests just get 404 page not found
• Addshore | |
Nov 3 2017, 3:01 PM |
F34011297: Screenshot_2021-01-21 Wikimedia Docker - Image alpine.png | |
Jan 21 2021, 7:47 PM |
F34011298: Screenshot_2021-01-21 Wikimedia Docker - Images.png | |
Jan 21 2021, 7:47 PM |
I want to be able to see what images exist on the wikimedia docker registry, as well as what tags they have, when they were added etc.
Currently HTTP requests just get 404 page not found
https://docker-registry.wikimedia.org/v2/_catalog
{ "repositories": [ "alpine", "calico/kube-policy-controller", "calico/node", "fluent-bit", "fluentd", "kubernetes-fluentd-daemonset", "nodejs-devel", "nodejs-slim", "pause", "prometheus-statsd-exporter", "python3", "python3-build-jessie", "python3-build-stretch", "python3-devel", "ruby", "servermon", "statsd-proxy", "wikimedia-jessie", "wikimedia-stretch", "wikimedia/mediawiki-services-mathoid" ] }
And for a given image/tag:
https://docker-registry.wikimedia.org/v2/wikimedia-jessie/manifests/latest
And there is probably half a dozen of frontends :]
I switched the title from UI to Homepage to keep things a little more basic.
I would be happy with a static html page that just linked to the URL / URLS that you just linked in the ticket!
Even just a redirect to https://docker-registry.wikimedia.org/v2/_catalog would be nice.
Portus seems like an interesting approach to two problems: -authn+authz and UI for a docker registry
http://port.us.org/features.html
I'll take a harder look as soon as I have time for it. Maybe during the ops offsite this weekend.
I made a little command line tool to help me find image: https://gist.github.com/thcipriani/7d7633eb238cd868d5ba24d0f1069463
Then I wrapped that in a bash script and generated a simple static site: https://people.wikimedia.org/~thcipriani/docker/
Right now when you visit https://docker-registry.wikimedia.org/ you get a generic nginx welcome page. I propose having this URL redirect to https://dockerregistry.toolforge.org/
Change 650215 had a related patch set uploaded (by Ahmon Dancy; owner: Ahmon Dancy):
[operations/puppet@production] Redirect top level URl to https://dockerregistry.toolforge.org/
Change 654725 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] [WIP] docker_registry_ha: Add a script to generate a static HTML homepage
Change 655792 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] docker_registry_ha: Have nginx serve /srv/hompage for /
Change 654725 merged by Legoktm:
[operations/puppet@production] docker_registry_ha: Add a script to generate a static HTML homepage
Mentioned in SAL (#wikimedia-operations) [2021-01-20T00:09:25Z] <legoktm> uploaded docker-report 0.0.4-1~deb9u1 to stretch-wikimedia (T179696)
Change 657210 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] docker_registry_ha: Make registery-homepage-builder Python 3.5 compatible
Change 657210 merged by Legoktm:
[operations/puppet@production] docker_registry_ha: Make registry-homepage-builder Python 3.5 compatible
Got pretty close, one last sticking point is that docker_report hardcodes connecting to the registry over HTTPS. So if you try https://localhost then you'll end up with requests.exceptions.SSLError: hostname 'localhost' doesn't match either of 'docker-registry.discovery.wmnet', 'docker-registry.svc.eqiad.wmnet', 'docker-registry.svc.codfw.wmnet', 'docker-registry.wikimedia.org'. And of course https://localhost:5000 (the HTTP port) fails with a protocol error.
Should we adapt docker_report to allow connecting over HTTP? I thought about using one of the domain names but then we're generating a homepage for a different registry, not the one that instance is serving...or does it not matter?
Change 657216 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] docker_registry_ha: Disable build-homepage job for now
Change 657216 merged by Legoktm:
[operations/puppet@production] docker_registry_ha: Disable build-homepage job for now
Or potentially have a flag to not do certificate validation. But we probably want to allow HTTP as well. It should make it easier to test/use it locally as well. Plus, it should make the script a bit faster.
Given the banner page we're creating is for use by the public, I think it can simply run against the public registry address. And yes, it should not matter as the storage is shared (it's swift containers).
For the record, I'm running the script as follows on registry1001:
/usr/local/bin/registry-homepage-builder docker-registry.wikimedia.org /root/homepage-test
to perform a test run, and it seems to work as expected.
For the record, I got the script running, by using
/usr/local/bin/registry-homepage-builder docker-registry.wikimedia.org /root/homepage-test
as arguments; and then I stumbled upon another problem: fetching all tags for some images gets a consistent 504 timeout (meaning we exceed the timeout of the tls terminator on the registry hosts).
Either we ignore it by just catching requests.exceptions.HTTPError, or we reduce the number of tags we fetch to a smaller number than 100.
Ack, sounds good to me.
This was fixed in https://gerrit.wikimedia.org/r/plugins/gitiles/operations/docker-images/docker-report/+/2b57ec7cc2040c34d67b7dd94556f6a1d72ebafe - I can cherry-pick a smaller version of that on top of the 0.0.4 stretch package.
Change 657412 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] docker_registry_ha: Enable and fix build-homepage job
Change 657412 merged by Legoktm:
[operations/puppet@production] docker_registry_ha: Enable and fix build-homepage job
Change 655792 merged by Legoktm:
[operations/puppet@production] docker_registry_ha: Have nginx serve /srv/homepage for /
https://docker-registry.wikimedia.org/ ta-da
Tested by:
It looks like the build-homepage job is failing because of HTTP 504 timeouts though, despite me applying the automatic retry patch to the stretch version of docker-report. Especially when all 4 hosts run the job on the hour at the same time. When just one is running, it most likely will work.
Change 657446 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] docker_registry_ha: Randomize timing of build-homepage job
Change 657446 merged by Legoktm:
[operations/puppet@production] docker_registry_ha: Randomize timing of build-homepage job
Thanks Legoktm. Small feature request: Can you add "last updated at
<blah>" text to the top right corner of the page?
Change 657678 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] docker_registry_ha: Add timestamp to build-homepage output
Change 657678 merged by Legoktm:
[operations/puppet@production] docker_registry_ha: Add timestamp to build-homepage output
The job fails on registry2002, leading to icinga alerts
Jan 25 09:30:01 registry2002 systemd[1]: Started Build docker-registry homepage. Jan 25 09:30:01 registry2002 registry-homepage-builder[31567]: INFO:root:Fetching the image catalog for docker-registry.discovery.wmnet Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: ERROR:root:Error getting data from the registry Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: Traceback (most recent call last): Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: File "/usr/lib/python3/dist-packages/docker_report/registry/__init__.py", line 91, in _get_all_pages Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: resp = self._request(url_part) Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: File "/usr/lib/python3/dist-packages/docker_report/registry/__init__.py", line 82, in _request Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: response.raise_for_status() Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: File "/usr/lib/python3/dist-packages/requests/models.py", line 893, in raise_for_status Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: raise HTTPError(http_error_msg, response=self) Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://docker-registry.discovery.wmnet Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: Traceback (most recent call last): Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: File "/usr/lib/python3/dist-packages/docker_report/registry/__init__.py", line 91, in _get_all_pages Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: resp = self._request(url_part) Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: File "/usr/lib/python3/dist-packages/docker_report/registry/__init__.py", line 82, in _request Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: response.raise_for_status() Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: File "/usr/lib/python3/dist-packages/requests/models.py", line 893, in raise_for_status Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: raise HTTPError(http_error_msg, response=self) Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://docker-registry.discovery.wmnet Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: During handling of the above exception, another exception occurred: Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: Traceback (most recent call last): Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: File "/usr/local/bin/registry-homepage-builder", line 136, in <module> Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: main() Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: File "/usr/local/bin/registry-homepage-builder", line 122, in main Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: for image, tags in sorted(registry.get_image_tags(sort=False).items()): Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: File "/usr/lib/python3/dist-packages/docker_report/registry/browser.py", line 61, in get_image_tags Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: for image_name in self._get_images_list(): Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: File "/usr/lib/python3/dist-packages/docker_report/registry/browser.py", line 50, in _get_images_list Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: for resp in self._get_all_pages("/v2/_catalog"): Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: File "/usr/lib/python3/dist-packages/docker_report/registry/__init__.py", line 108, in _get_all_pages Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: raise RegistryError(url_part) Jan 25 09:32:01 registry2002 registry-homepage-builder[31567]: docker_report.registry.RegistryError: /v2/_catalog?last=releng%2Fquibble-jessie-php55&n=100 Jan 25 09:32:01 registry2002 systemd[1]: build-homepage.service: Main process exited, code=exited, status=1/FAILURE
Gateway Time-out for url: https://docker-registry.discovery.wmnet
Gotta set HTTP_PROXY/HTTPS_PROXY env variable?
legoktm@registry2002:~$ time curl "https://docker-registry.discovery.wmnet/v2/_catalog?last=releng%2Fquibble-jessie-php55&n=100" {"repositories":["releng/quibble-jessie-php56","releng/quibble-stretch","releng/quibble-stretch-bundle","releng/quibble-stretch-hhvm","releng/quibble-stretch-php70","releng/quibble-stretch-php71","releng/quibble-stretch-php72","releng/quibble-stretch-php73","releng/quibble-stretch-php74","releng/quibble-stretch-php80","releng/rake","releng/rake-poolcounter","releng/rake-vagrant","releng/release-notes","releng/rust","releng/rust-coverage","releng/shellcheck","releng/sonar-scanner","releng/sury-php","releng/tabs","releng/tox","releng/tox-acme-chief","releng/tox-buster","releng/tox-censorshipmonitoring","releng/tox-cergen","releng/tox-certcentral","releng/tox-conftool","releng/tox-eventlogging","releng/tox-homer","releng/tox-labs-striker","releng/tox-mysqld","releng/tox-poolcounter","releng/tox-pyspark","releng/tox-pywikibot","releng/typos","releng/wikimedia-audit-resources","releng/zuul-cloner","ruby","servermon","service-checker","statsd-proxy","tiller","wikimedia/blubber","wikimedia/blubber-doc-example-helloworldoid","wikimedia/eventgate-ci","wikimedia/eventgate-wikimedia","wikimedia/mediawiki-core","wikimedia/mediawiki-libs-shellbox","wikimedia/mediawiki-services-apertium","wikimedia/mediawiki-services-change-propagation","wikimedia/mediawiki-services-chromium-render","wikimedia/mediawiki-services-citoid","wikimedia/mediawiki-services-cxserver","wikimedia/mediawiki-services-eventstreams","wikimedia/mediawiki-services-graphoid","wikimedia/mediawiki-services-kask","wikimedia/mediawiki-services-mathoid","wikimedia/mediawiki-services-mobileapps","wikimedia/mediawiki-services-parsoid","wikimedia/mediawiki-services-push-notifications","wikimedia/mediawiki-services-recommendation-api","wikimedia/mediawiki-services-restbase","wikimedia/mediawiki-services-similar-users","wikimedia/mediawiki-services-wikifeeds","wikimedia/mediawiki-services-wikispeech-mary-tts","wikimedia/mediawiki-services-wikispeech-mishkal","wikimedia/mediawiki-services-wikispeech-pronlex","wikimedia/mediawiki-services-wikispeech-symbolset","wikimedia/mediawiki-services-wikispeech-wikispeech-server","wikimedia/mediawiki-services-zotero","wikimedia/research-mwaddlink","wikimedia/wikibase-termbox","wikimedia/wikidata-query-flink-rdf-streaming-updater","wikimedia/wikimedia-portals","wikimedia/wikimedia-production","wikimedia/wvui","wikimedia-buster","wikimedia-jessie","wikimedia-stretch","wmfdebug"]} real 0m34.496s user 0m0.016s sys 0m0.012s
docker-report retries this request 3 times, and it's failing for each time. I tried it a few times with curl manually and it works fine... I don't understand what's different about registry2002 that its requests fail but the other 3 hosts are perfectly OK.
legoktm@registry2002:~$ time curl "https://docker-registry.discovery.wmnet/v2/_catalog?last=releng%2Fquibble-jessie-php55&n=100" <html> <head><title>504 Gateway Time-out</title></head> <body bgcolor="white"> <center><h1>504 Gateway Time-out</h1></center> <hr><center>nginx/1.13.6</center> </body> </html> real 1m0.036s user 0m0.028s sys 0m0.004s
We could also increase the timeout on the nginx side given that every client is going to retry the request anyways?
Change 658436 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] docker_registry_ha: Increase nginx proxy timeout to 120s
In my testing of repeatedly issuing the same curl command over and over, it usually took ~35s to respond, but sometimes it took over 1m, the worst I saw was 1m42s. So I'm proposing to raise the nginx proxy timeout to 2m to avoid needing to retry on these timeouts in the first place.
Change 658436 merged by Giuseppe Lavagetto:
[operations/puppet@production] docker_registry_ha: Increase nginx proxy timeout to 120s
Alternatively, bypass nginx and talk HTTP directly to the registry software. It's anyway on the same host. It will require some modification of the software, but it will avoid such issues.
we're still experiencing timeouts when trying to gather the catalog list with the url
/v2/_catalog?last=releng%2Fquibble-jessie-php55&n=100
this only happens on registry2002 consistently.
Interestingly, it seems like requesting the url multiple times makes it respond in ~ 40 seconds on retries, so it's a bit strange we're still failing.
Looking at the logs from a failed run, it looks like no retry is attempted when a 504 is received, at least on registry2002.
Every 504 from the registry is not followed by another call that ends in a 200 or otherwise.
I see it retrying 3 times:
Jan 26 20:30:02 registry2002 registry-homepage-builder[22695]: INFO:root:Fetching the image catalog for docker-registry.discovery.wmnet Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: ERROR:root:Error getting data from the registry Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: Traceback (most recent call last): Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: File "/usr/lib/python3/dist-packages/docker_report/registry/__init__.py", line 91, in _get_all_pages Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: resp = self._request(url_part) Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: File "/usr/lib/python3/dist-packages/docker_report/registry/__init__.py", line 82, in _request Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: response.raise_for_status() Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: File "/usr/lib/python3/dist-packages/requests/models.py", line 893, in raise_for_status Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: raise HTTPError(http_error_msg, response=self) Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://docker-registry.discovery.wmnet/v2/_catalog?last=releng%2Fquibble-jessie-php55&n=100 Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: Traceback (most recent call last): Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: File "/usr/lib/python3/dist-packages/docker_report/registry/__init__.py", line 91, in _get_all_pages Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: resp = self._request(url_part) Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: File "/usr/lib/python3/dist-packages/docker_report/registry/__init__.py", line 82, in _request Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: response.raise_for_status() Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: File "/usr/lib/python3/dist-packages/requests/models.py", line 893, in raise_for_status Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: raise HTTPError(http_error_msg, response=self) Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://docker-registry.discovery.wmnet/v2/_catalog?last=releng%2Fquibble-jessie-php55&n=100 Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: During handling of the above exception, another exception occurred: Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: Traceback (most recent call last): Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: File "/usr/local/bin/registry-homepage-builder", line 136, in <module> Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: main() Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: File "/usr/local/bin/registry-homepage-builder", line 122, in main Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: for image, tags in sorted(registry.get_image_tags(sort=False).items()): Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: File "/usr/lib/python3/dist-packages/docker_report/registry/browser.py", line 61, in get_image_tags Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: for image_name in self._get_images_list(): Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: File "/usr/lib/python3/dist-packages/docker_report/registry/browser.py", line 50, in _get_images_list Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: for resp in self._get_all_pages("/v2/_catalog"): Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: File "/usr/lib/python3/dist-packages/docker_report/registry/__init__.py", line 108, in _get_all_pages Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: raise RegistryError(url_part) Jan 26 20:32:13 registry2002 registry-homepage-builder[22695]: docker_report.registry.RegistryError: /v2/_catalog?last=releng%2Fquibble-jessie-php55&n=100
This seems like the most foolproof thing to do, so I'll submit patches, but I still am curious why this is going wrong in this specific case only on registry2002.
Change 658684 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/docker-images/docker-report@master] Allow talking to the registry over HTTP
Change 658684 merged by jenkins-bot:
[operations/docker-images/docker-report@master] Allow talking to the registry over HTTP
Change 659095 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] docker_registry_ha: Have build-homepage talk directly to the registry
Change 659095 merged by Legoktm:
[operations/puppet@production] docker_registry_ha: Have build-homepage talk directly to the registry
Change 650215 abandoned by Ahmon Dancy:
[operations/puppet@production] Redirect top level URL to https://dockerregistry.toolforge.org/
Reason:
superseded by https://gerrit.wikimedia.org/r/c/operations/puppet/ /654725