Page MenuHomePhabricator

Pushes to docker-registry fail for images with compressed layers of size >1GB
Closed, ResolvedPublic

Description

During stress testing of the docker-registry infrastructure @akosiaris discovered that pushing of images that consist of layers with a compressed size of > 1GB fails (https://phabricator.wikimedia.org/T264209#6546056).

Just extracting the relevant discussion from T264209 into this task:

We need to permanently bump the tmpfs /var/lib/nginx size if we want to be able to consistently push images with blobs that are larger than 1 GB compressed

Couldn't we get around this by using a (bigger) non tmpfs filesystem as client_body_temp_path?
Not sure how much the upload performance would suffer in this case, but we could test that...

+1 on this suggestion. For small requests, there will be minimal writing to a real filesystem for files that exist briefly. These writes would be background I/O in most cases.

The main issue will be that large pushes to the registry will become slower. Hence CI will be taking longer overall. That being said, we probably should try to optimize for a combination of client_body_buffer_size and a larger but slower fs that addresses the most common patterns in our CI.

That being said, with compression on the client before the push taking also a significant time as well (per people's reports in https://github.com/moby/moby/issues/1266), the delay from lower IOPS might not be the most contributing factor here (and there doesn't seem to be anything we can do about it)

I 'll try and devise a couple of tests to run to get numbers on this.

@dancy ran into this exact issue yesterday so @Joe went ahead and increased the tmpfs on registry2* from 1GB to 2GB which, given the registry VMs do have 4GB of memory (and barely use it), is a good quick way out of the misery.

Event Timeline

JMeybohm triaged this task as High priority.

Change 710218 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] docker_registry_ha: Increase nginx tmpfs size from 1gb to 2gb

https://gerrit.wikimedia.org/r/710218

Change 710218 merged by JMeybohm:

[operations/puppet@production] docker_registry_ha: Increase nginx tmpfs size from 1gb to 2gb

https://gerrit.wikimedia.org/r/710218

tmpfs resized to 2GB on all registry nodes

JMeybohm added a subscriber: Legoktm.

AIUI from IRC backlog we had issues again @dancy / @Legoktm

10.64.48.17 - ci-restricted [14/Dec/2021:23:41:44 +0000] "PATCH /v2/restricted/mediawiki-multiversion/blobs/uploads/7b3d3607-f28f-406d-add1-a00b78be2458?_state=zW-HvRiQsOowai-0FBYoYd3B1nk2Uxb0BQrtueun4b57Ik5hbWUiOiJyZXN0cmljdGVkL21lZGlhd2lraS1tdWx0aXZlcnNpb24iLCJVVUlEIjoiN2IzZDM2MDctZjI4Zi00MDZkLWFkZDEtYTAwYjc4YmUyNDU4IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIxLTEyLTE0VDIzOjI1OjIyLjgyODUyNzI5WiJ9 HTTP/1.1" 500 193 "-" "docker/18.09.1 go/go1.11.6 git-commit/4c52b90 kernel/4.19.0-17-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.1 \x5C(linux\x5C))"

It looks like the situation has been mitigated again for now by fixing/shrinking the image.

JMeybohm lowered the priority of this task from High to Medium.Dec 15 2021, 1:19 PM

We're running into this again today. I tried the same workaround as before (removing a recent but inactive mediawiki checkout) but it didn't help this time.

Noting for the record that pushes magically started working again around 12:28 UTC (2021/01/19).

Removing myself from assignee as I'm not currently working on this

Another case of failing to push a large image: T342084. Is it possible to configure NGINX to use a different (large, on-disk) storage area for certain URLs?

Another case of failing to push a large image: T342084. Is it possible to configure NGINX to use a different (large, on-disk) storage area for certain URLs?

Use case from ML - we are porting the recommendation-api Python service from wmf-cloud to k8s. It uses a dict file containing embeddings (basically a serialized blob) that weights around 3GB (so one big layer when we add it) and that it is loaded when the application boostraps. We are working with Research to figure out how to handle cases like this one, since "embeddings" could get multiple shapes and sizes, and they'd need to have flexibility when they experiment (like, using a serialized blob as starter and then think about some specialized datastore). This is not an "experimentation" use case of course, but the recommendation-api's setting may vary in the future. We could think about other options if raising the tmpfs mount is not an option (like fetching from swift or similar).

Another case of failing to push a large image: T342084. Is it possible to configure NGINX to use a different (large, on-disk) storage area for certain URLs?

It certainly is possible to definite different caching areas directives for various nginx configuration stanzas but what exactly do you have in mind?

Another case of failing to push a large image: T342084. Is it possible to configure NGINX to use a different (large, on-disk) storage area for certain URLs?

Use case from ML - we are porting the recommendation-api Python service from wmf-cloud to k8s. It uses a dict file containing embeddings (basically a serialized blob) that weights around 3GB (so one big layer when we add it) and that it is loaded when the application boostraps. We are working with Research to figure out how to handle cases like this one, since "embeddings" could get multiple shapes and sizes, and they'd need to have flexibility when they experiment (like, using a serialized blob as starter and then think about some specialized datastore). This is not an "experimentation" use case of course, but the recommendation-api's setting may vary in the future. We could think about other options if raising the tmpfs mount is not an option (like fetching from swift or similar).

So, we 've met this embeddings question in the past, in the form of the word2vec package for ORES. See T188446 and T187217. I was against packaging that artifact in a Debian package or put in the git repo as is and I still think that was a good decision. git-lfs however, despite promising, didn't pan out to be that great, for a variety of reasons.

Despite talking about container images here, same principles apply. Putting, what is clearly data (and a lot of it) in containers is going to cause operational headaches (probably for multiple teams) down the line. If it can be done in any better way like fetching it from Swift or splicing it, storing it in some datastore and querying it over an API is definitely a way more sustainable path forward.

@akosiaris Hi! Getting back to the issue, this time in a different form T359067. We have followed your suggestion for the embeddings (now they live in swift), but Pytorch is now giving us some headaches so your opinion would be really appreciated :)

dancy claimed this task.

This should be covered by T404742.