Page MenuHomePhabricator

Container image reports in debmonitor are broken
Open, HighPublic

Description

I tried to check on images for T348647 and found that:

  • debmonitor is lacking most of our images (basically everything that should be handled by k8s_rules.ini IIUC)
  • debmonitor does not contain any images that are build by gitlab pipelines
  • docker-report is painfully slow, taking >5min to fetch the docker-registry-catalog (which probably is worth another task)
  • docker-report uses docker-registry.wikimedia.org as registry, so everything goes via CDN (the catalog is not cached IIRC)
  • debmonitor data is hard to read as there are so many images that we never ran and probably will never run in prod

To work around the immediate issue I crated a list of all images currently running in prod, currently running docker-report-debmonitor for all of them on build2001:

kubectl get pods --all-namespaces --field-selector=status.phase=Running -o jsonpath="{..image}" | tr ' ' '\n' | sed s/docker-registry.discovery.wmnet/docker-registry.wikimedia.org/ | sort -u > prod_images

# on build2001
export http_proxy=http://webproxy.codfw.wmnet:8080
mkdir /tmp/tmpxczldyc8-docker-report
chgrp debmonitor /tmp/tmpxczldyc8-docker-report
chmod 0770 /tmp/tmpxczldyc8-docker-report
cat prod_images | while read image; do docker-report-debmonitor "$image" /tmp/tmpxczldyc8-docker-report; done

While running that I realized it would be nice if docker-report-debmonitor could check if the image has already been submitted to debmonitor (to not do it again). It does not seem possible to GET or HEAD URLS like debmonitor.discovery.wmnet/images/docker-registry.wikimedia.org/httpd-fcgi:2.4.38-10-u4-20231009 even when authenticating using the hosts client cert.

Event Timeline

The original idea for the report of images to debmonitor was that they should be reported at creation time, and, given their immutability, it shouldn't require the need to report them again until deletion. Given the lack of a way to properly cleanup them the current implementation, as you know, is different.

debmonitor is lacking most of our images (basically everything that should be handled by k8s_rules.ini IIUC)

Did you find why we're missing images?

debmonitor data is hard to read as there are so many images that we never ran and probably will never run in prod

This is also aggravated by the lack of Image garbage collection when an image is not anymore deployed and obsolete. For the lack of this we've implemented an automatic GC after 90 days.

While running that I realized it would be nice if docker-report-debmonitor could check if the image has already been submitted to debmonitor (to not do it again).

This would also require that the GC of the images is done by the image management lifecycle and not by Debmonitor solely based on the last updated datetime.

It does not seem possible to GET or HEAD URLS like debmonitor.discovery.wmnet/images/docker-registry.wikimedia.org/httpd-fcgi:2.4.38-10-u4-20231009 even when authenticating using the hosts client cert.

Correct, that's the current implementation, where the website is accessible via authenticated users and the update/delete methods are instead accessible via host client certs. It's all managed by the verify_clients decorator.
If needed I guess we could allow the HEAD method for the images detail view to be accessible also via client host auth.

Change 966151 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] docker-report: Add exclude action to filter rule

https://gerrit.wikimedia.org/r/966151

Change 966200 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/docker-images/docker-report@master] Allow to only report images of supported Debian versions

https://gerrit.wikimedia.org/r/966200

The original idea for the report of images to debmonitor was that they should be reported at creation time, and, given their immutability, it shouldn't require the need to report them again until deletion. Given the lack of a way to properly cleanup them the current implementation, as you know, is different.

Unfortunately this is not technically enforced, so there are changing tags. I don't think this problem is easily solved, but I wanted to list my findings somewhere so we can pick them up when designing a better system.

debmonitor is lacking most of our images (basically everything that should be handled by k8s_rules.ini IIUC)

Did you find why we're missing images?

Yes, broken filter rule (https://gerrit.wikimedia.org/r/c/operations/puppet/+/966151/)

I've also created a patch for docker-reports to ignore unsupported debian versions without manual filters: https://gerrit.wikimedia.org/r/c/operations/docker-images/docker-report/+/966200/

Change 966151 merged by JMeybohm:

[operations/puppet@production] docker-report: Add exclude action to filter rule

https://gerrit.wikimedia.org/r/966151

Jelto subscribed.

debmonitor does not contain any images that are build by gitlab pipelines

Adding GitLab tag.

Change 982793 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] docker-report: Fix stretch images regex

https://gerrit.wikimedia.org/r/982793

Change 982793 merged by Clément Goubert:

[operations/puppet@production] docker-report: Fix stretch images regex

https://gerrit.wikimedia.org/r/982793