Page MenuHomePhabricator

Create restricted docker-registry namespace for security patched images
Closed, ResolvedPublic

Description

In order to publish MediaWiki images that have security patches applied, we'll need a namespace under the existing docker-registry.wikimedia.org that is restricted, denying public access, granting read/write access to releases-jenkins (for publishing), and granting read access to k8s nodes (for runtime).

Event Timeline

Legoktm added a subscriber: Legoktm.

Should the existence of these images also be secret? Or is it okay if they're listed on the docker-registry homepage, knowing that people won't be able to pull them? I'm not sure if it's possible to fully hide them from the API responses, I'll need to look into the docs in more detail.

Also, is there a chance developers will ask to be able to pull images locally for debugging? I suppose that can be done by setting up a ssh tunnel to the right host.

I read through https://docs.docker.com/registry/spec/api/#pulling-an-image and https://docs.docker.com/registry/recipes/nginx/ (what we already implement) and my plan is to change the nginx config to:

  • allow GET/HEAD to /v2/restricted/ only from k8s nodes and release-jenkins
  • allow POST/PUT/PATCH/DELETE (probably not restrict method at all) to /v2/restricted only from release-jenkins

Anyone from the internet could do a GET /v2/_catalog to get a list of all the restricted images' names but they wouldn't be able to make the subsequent API requests to get tags or digests, etc.

Also, is there a chance developers will ask to be able to pull images locally for debugging? I suppose that can be done by setting up a ssh tunnel to the right host.

I can't think of a case for that at the moment. And yeah, if that becomes a need maybe we can open up access to deployment hosts as well and people (the same folks who can currently see /srv/patches) can tunnel.

I read through https://docs.docker.com/registry/spec/api/#pulling-an-image and https://docs.docker.com/registry/recipes/nginx/ (what we already implement) and my plan is to change the nginx config to:

  • allow GET/HEAD to /v2/restricted/ only from k8s nodes and release-jenkins
  • allow POST/PUT/PATCH/DELETE (probably not restrict method at all) to /v2/restricted only from release-jenkins

Nice! That seems right.

Anyone from the internet could do a GET /v2/_catalog to get a list of all the restricted images' names but they wouldn't be able to make the subsequent API requests to get tags or digests, etc.

If they can't even get labels/tags/digests, I think that's totally sufficient. As long as we don't put any sensitive information about security patches in image names (e.g. mediawiki-patched-for-T123-session-hijack), and I can't think why we'd ever do something like that.

@dancy or @jeena, do you have thoughts on this?

Also, is there a chance developers will ask to be able to pull images locally for debugging? I suppose that can be done by setting up a ssh tunnel to the right host.

I read through https://docs.docker.com/registry/spec/api/#pulling-an-image and https://docs.docker.com/registry/recipes/nginx/ (what we already implement) and my plan is to change the nginx config to:

  • allow GET/HEAD to /v2/restricted/ only from k8s nodes and release-jenkins
  • allow POST/PUT/PATCH/DELETE (probably not restrict method at all) to /v2/restricted only from release-jenkins

Anyone from the internet could do a GET /v2/_catalog to get a list of all the restricted images' names but they wouldn't be able to make the subsequent API requests to get tags or digests, etc.

I would double check if we can support pulling with credentials in kubernetes, but I think it's probably better to have a password protection for those urls like we do for other stuff. it would avoid us the pain of having to make a puppet patch + puppet run every time we add a new k8s node to the clusters.

So my proposal would boil down to having:

  • An user that can pull any image (so GET/HEAD to /v2/restricted/)
  • An user that can publish normal images (so, what we have right now)
  • An user that can publish any image including to restricted (to use on the release servers)

Change 662807 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] [WIP] docker_registry_ha: Have restricted/ images that are limited read/write

https://gerrit.wikimedia.org/r/662807

So my proposal would boil down to having:

  • An user that can pull any image (so GET/HEAD to /v2/restricted/)
  • An user that can publish normal images (so, what we have right now)
  • An user that can publish any image including to restricted (to use on the release servers)

This is what I attempted to implement.

I think we could make kubernetes use docker credentials distributed to the nodes by default, see https://kubernetes.io/docs/concepts/containers/images/#configuring-nodes-to-authenticate-to-a-private-registry

Is there a preference on whether to use docker or imagePullSecrets for this? The former seems straightforward to do with puppet, I'm not sure how to do the latter.

Is there a preference on whether to use docker or imagePullSecrets for this? The former seems straightforward to do with puppet, I'm not sure how to do the latter.

Definitely the former. The imagePullSecrets way would need us to populate those secrets to every namespace etc. It's way easier to distribute the docker config via puppet as you said!

Change 663064 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] k8s: Add docker-registry credentials to pull restricted images

https://gerrit.wikimedia.org/r/663064

Change 662807 merged by Legoktm:
[operations/puppet@production] docker_registry_ha: Have restricted/ images that are limited read/write

https://gerrit.wikimedia.org/r/662807

Change 664683 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] docker_registry_ha: Have restricted/ images that are limited read/write (try 2)

https://gerrit.wikimedia.org/r/664683

So the first deploy didn't go well because I didn't test the nginx configuration and didn't realize the intricacies it has. This time I created a stripped-down version of the config (P14384) and a simple Dockerfile:

FROM nginx

RUN apt-get update && apt-get install apache2-utils -y

RUN htpasswd -cb /etc/nginx/regular-push.htpasswd user password
RUN htpasswd -cb /etc/nginx/restricted-push.htpasswd user password
RUN htpasswd -cb /etc/nginx/restricted-read.htpasswd user password
COPY nginx.conf /etc/nginx/conf.d/registry.conf

And then I ran a bunch of curl commands to verify I got 401s for the correct endpoints/methods (and 404s for others), and the right credentials were prompted for:

km@cashew ~/t/registry-nginx-test> curl -I localhost:8080/v2
HTTP/1.1 404 Not Found

km@cashew ~/t/registry-nginx-test> curl -I localhost:8080/v2/restricted/foo
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="docker-registry restricted (restricted-read)"

km@cashew ~/t/registry-nginx-test> curl -I -X POST localhost:8080/v2/restricted/foo
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="docker-registry restricted (restricted-push)"

km@cashew ~/t/registry-nginx-test> curl -I -X POST localhost:8080/v2/foo
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="docker-registry (regular-push)"

I think we could make kubernetes use docker credentials distributed to the nodes by default, see https://kubernetes.io/docs/concepts/containers/images/#configuring-nodes-to-authenticate-to-a-private-registry

Heh, TIL. I was afraid we would have to end up with imagePullSecrets

One however thing we 'll need to keep in mind is that when we switch container runtime engines we might need to adapt this (I think containerd uses this as is, not sure about crio)

One extra thing to always keep in mind with this is that now all requests from k8s nodes to the registry will be authenticated. If anyone could instruct the nodes to pull even a single layer from a different registry, one that they own and is malicious, they would be able to read the password in the HTTP requests.

One extra thing to always keep in mind with this is that now all requests from k8s nodes to the registry will be authenticated. If anyone could instruct the nodes to pull even a single layer from a different registry, one that they own and is malicious, they would be able to read the password in the HTTP requests.

It looks to me like docker stores credentials on a per-repository basis, so credentials for repository X will not be used to authenticate with repository Y. @akosiaris, do you concur?

One extra thing to always keep in mind with this is that now all requests from k8s nodes to the registry will be authenticated. If anyone could instruct the nodes to pull even a single layer from a different registry, one that they own and is malicious, they would be able to read the password in the HTTP requests.

It looks to me like docker stores credentials on a per-repository basis, so credentials for repository X will not be used to authenticate with repository Y. @akosiaris, do you concur?

Right, the file looks like:

{
	"auths": {
		"https://docker-registry.discovery.wmnet": {
			"auth": "..."
		}
	}
}

One extra thing to always keep in mind with this is that now all requests from k8s nodes to the registry will be authenticated. If anyone could instruct the nodes to pull even a single layer from a different registry, one that they own and is malicious, they would be able to read the password in the HTTP requests.

It looks to me like docker stores credentials on a per-repository basis, so credentials for repository X will not be used to authenticate with repository Y. @akosiaris, do you concur?

It's actually a vulnerability I was referring to but I failed to communicate that. I am sorry for not having pasted the CVE. It's CVE-2020-15157. A nice and detailed writeup is at https://darkbit.io/blog/cve-2020-15157-containerdrip

NVD has it under https://nvd.nist.gov/vuln/detail/CVE-2020-15157 and says "Other container runtimes built on top of containerd but not using the default resolver (such as Docker) are not affected." We are currently using Docker and not containerd but we will have to migrate to containerd (or CRIO-O) eventually as docker support is deprecated in kubernetes starting in version 1.21 and probably being fully removed in 1.24. So this doesn't currently apply to us.

That being said, our docker version is an old version (depending on the cluster it's either 1.12.6-0~debian-jessie or 18.06.3~ce~3-0~debian but we are aiming to soon have only the latter) because of the need to have LVM devicemapper support which was a choice taken back in 2016, so it's not inconceivable that similar vulns may exist. I think we are somewhat well guarded against such an attack however in our environment.

By the way, we aren't really tied to the LVM devicemapper support (there were no better alternatives back then) and part of the migration to another container runtime engine will probably move away from that choice and into overlay2.

Right, the file looks like:

{
	"auths": {
		"https://docker-registry.discovery.wmnet": {
			"auth": "..."
		}
	}
}

Thanks for the reference @akosiaris. That was a good read.

Change 664683 merged by Legoktm:
[operations/puppet@production] docker_registry_ha: Have restricted/ images that are limited read/write (try 2)

https://gerrit.wikimedia.org/r/664683

The restricted/ namespace is now live. Where should I put the credentials for the ci-restricted user?

Change 666722 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] releases: Add docker-registry credentials to push restricted images

https://gerrit.wikimedia.org/r/666722

Change 666722 merged by Legoktm:
[operations/puppet@production] releases: Add docker-registry credentials to push restricted images

https://gerrit.wikimedia.org/r/666722

The restricted/ namespace is now live. Where should I put the credentials for the ci-restricted user?

Sorry for the delayed response. (I've had a bit of tunnel vision getting the multiversion image built.)

PipelineLib currently calls out to a simple wrapper script (see operations/puppet/modules/docker_pusher) around docker push that's called via sudo to limit exposure of the credentials stored in its configuration file.

Will the new credentials also work for pushing to the non-restricted namespace? If so, I think we can just replace/overwrite the old credential for just the releases hosts in the hiera data.

In other words, override the value of this secret for releases hosts with the new credentials.

PipelineLib currently calls out to a simple wrapper script (see operations/puppet/modules/docker_pusher) around docker push that's called via sudo to limit exposure of the credentials stored in its configuration file.

Will the new credentials also work for pushing to the non-restricted namespace? If so, I think we can just replace/overwrite the old credential for just the releases hosts in the hiera data.

Yep they will. I had already dropped the new credentials to /root/.docker/config.json, let me move that to /etc/docker-pusher/config.json now.

Change 667682 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] docker_pusher: Use new docker-registry credentials

https://gerrit.wikimedia.org/r/667682

Change 667682 merged by Legoktm:
[operations/puppet@production] docker_pusher: Use new docker-registry credentials

https://gerrit.wikimedia.org/r/667682

Success! I was able to build and push an image via docker-pusher on releases1002 to the restricted path (docker-registry.wikimedia.org/restricted/mediawiki-multiversion:2021-03-11-213947-production). Thanks for your hard work on this, @Legoktm

Change 663064 merged by Legoktm:
[operations/puppet@production] k8s: Add docker-registry credentials to pull restricted images

https://gerrit.wikimedia.org/r/663064

One remaining issue, the k8s nodes can't pull restricted images yet. This is because the k8s nodes aren't in the allow_push_from IP allowlist in the nginx configuration: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/docker_registry_ha/templates/registry-nginx.conf.erb#81 - once I added the kubestage2001 IP in there it could start pulling images just fine.

I assume we don't want to manually maintain an IP list of every k8s node - can puppet provision the IPs for us?

Change 672537 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] docker_registry_ha: Require authentication from k8s nodes

https://gerrit.wikimedia.org/r/672537

Change 672537 merged by Legoktm:
[operations/puppet@production] docker_registry_ha: Require authentication from k8s nodes

https://gerrit.wikimedia.org/r/672537

OK, we really should be done now :) Note that currently only the codfw kubestage nodes can pull restricted/ images, we will add the credentials to the rest of the nodes when it's needed.

OK, we really should be done now :) Note that currently only the codfw kubestage nodes can pull restricted/ images, we will add the credentials to the rest of the nodes when it's needed.

Maybe you can do so for staging-eqiad already. Just to have both staging servers configured equally and to have the setup being used a bit.

OK, we really should be done now :) Note that currently only the codfw kubestage nodes can pull restricted/ images, we will add the credentials to the rest of the nodes when it's needed.

Maybe you can do so for staging-eqiad already. Just to have both staging servers configured equally and to have the setup being used a bit.

Done.

Change 692881 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[labs/private@master] docker_kubernetes_user_password is in common for staging

https://gerrit.wikimedia.org/r/692881

Change 692881 merged by JMeybohm:

[labs/private@master] docker_kubernetes_user_password is in common for staging

https://gerrit.wikimedia.org/r/692881

I needed the credentials on production nodes now, so I did set the password for production clusters now as well.

Change 694552 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] httpbb: Add tests for docker-registry

https://gerrit.wikimedia.org/r/694552

Change 694552 merged by JMeybohm:

[operations/puppet@production] httpbb: Add tests for docker-registry

https://gerrit.wikimedia.org/r/694552