Page MenuHomePhabricator

[Toolforge] Generic webservice not working on Kubernetes
Open, Needs TriagePublicBUG REPORT

Description

On the Kubernetes backend for Toolforge, the generic webservice type does not seem to be working. When passing --backend=gridengine, the following command works perfectly fine:

$ become qrank
$ webservice --backend=gridengine generic start /data/project/qrank/bin/qrank-webserver 
Starting webservice.....

Expected: With --backend=kubernetes, the service should get started on Kubernetes.
Actual:

$ webservice --backend=kubernetes generic start /data/project/qrank/bin/qrank-webserver 
type must be one of:
golang (DEPRECATED)
  * golang111
  * jdk11
  * jdk8 (DEPRECATED)
  * node10
  * nodejs (DEPRECATED)
  * php5.6 (DEPRECATED)
  * php7.2
  * php7.3
  * python (DEPRECATED)
  * python2 (DEPRECATED)
  * python3.5
  * python3.7
  * ruby2 (DEPRECATED)
  * ruby25
  * tcl

Event Timeline

There is no generic on kubernetes because there's no container image named that. The grid assumes you have a whole node with loads of packages installed.

With the Go programming language, binaries typically get statically linked. So, compiled programs will typically run without any runtime dependencies whatsoever — they wouldn’t access package files, call shared libraries, or use any other files. When compiling for Linux, the compiler builds an ELF binary that directly invokes the operating system kernel through Linux system calls, not even using libc or anything else in a Linux distribution. Rust may be similar in that respect (not sure); static linking can also be done with C and C++, although it’s a bit less common there.

Apologies if I’m stating the obvious. I’m not trying to hype the Go language, or distroless containers, or anything else for that matter... just trying to explain why Toolforge might want to support the generic webservice type on Kubernetes.

You'll probably find https://wikitech.wikimedia.org/wiki/User:Legoktm/Rust_on_Toolforge#webservice helpful for golang stuff. Yes, your container will have all the golang development tools but it shouldn't affect your tool. T194953#6266496 is an idea on how to properly support statically compiled languages in k8s, but it might be moot when buildpacks come around.

With the Go programming language, binaries typically get statically linked. So, compiled programs will typically run without any runtime dependencies whatsoever — they wouldn’t access package files, call shared libraries, or use any other files. When compiling for Linux, the compiler builds an ELF binary that directly invokes the operating system kernel through Linux system calls, not even using libc or anything else in a Linux distribution. Rust may be similar in that respect (not sure); static linking can also be done with C and C++, although it’s a bit less common there.

Apologies if I’m stating the obvious. I’m not trying to hype the Go language, or distroless containers, or anything else for that matter... just trying to explain why Toolforge might want to support the generic webservice type on Kubernetes.

I run two supporting services in that cluster using a "scratch" container that really has basically nothing but my go binary in the image. It might not be the worst idea to create a static-binary container image that's really just there to run statically compiled languages like go and rust with the least pointless overhead as possible. It would likely not be the same as what we call "generic"--which is really "generic gridengine-based service". In fact, we'd probably call it something more specific so people don't assume they get anything "for free" in it.

If you don't mind, I can rework this into a feature request to add an extremely lightweight webservice setup for static binaries. Ultimately, it's not a bug that "generic" doesn't work because that's a historical creature of the gridengine environment, but we probably could make more straightforward containerization for these languages.

I run two supporting services in that cluster using a "scratch" container that really has basically nothing but my go binary in the image. It might not be the worst idea to create a static-binary container image that's really just there to run statically compiled languages like go and rust with the least pointless overhead as possible. It would likely not be the same as what we call "generic"--which is really "generic gridengine-based service". In fact, we'd probably call it something more specific so people don't assume they get anything "for free" in it.

+1. It would be nice if said container could also include libssl and libmariadb (just the .so files, not the -dev headers) since Rust programs still tend to link against those in my experience.

If you don't mind, I can rework this into a feature request to add an extremely lightweight webservice setup for static binaries. Ultimately, it's not a bug that "generic" doesn't work because that's a historical creature of the gridengine environment, but we probably could make more straightforward containerization for these languages.

Toolforge is not today a general purpose Kubernetes hosting platform. It's a semi-opinionated PaaS with a lot of legacy debt related to it's origins as a shared hosting platform in 2005.

We know that custom built containers are something that we want to support, and we have T194332: [Epic] Make Toolforge a proper platform as a service with push-to-deploy and build packs as an active project moving in that direction. I'm having a hard time seeing the value in adding more container types to support not having a language runtime. The containers that @Bstorm is using for Toolforge infrastructure are also not using any webservice tooling, and there is deeper complexity here in the idea of making webservice work with a container that does not match the current expectations of LDAP derived runtime user etc.

Honestly this task could be easily closed as invalid as it is reporting designed behavior that is enforced by the software and also documented on wikitech.

Toolforge is not today a general purpose Kubernetes hosting platform. It's a semi-opinionated PaaS with a lot of legacy debt related to it's origins as a shared hosting platform in 2005.

We know that custom built containers are something that we want to support, and we have T194332: [Epic] Make Toolforge a proper platform as a service with push-to-deploy and build packs as an active project moving in that direction. I'm having a hard time seeing the value in adding more container types to support not having a language runtime. The containers that @Bstorm is using for Toolforge infrastructure are also not using any webservice tooling, and there is deeper complexity here in the idea of making webservice work with a container that does not match the current expectations of LDAP derived runtime user etc.

Honestly this task could be easily closed as invalid as it is reporting designed behavior that is enforced by the software and also documented on wikitech.

This would include all webservice tooling and look exactly like the ruby container without any of the ruby libraries. We have growing demand for compiled languages, and this would really just be a very basic Buster container otherwise set up to include the sssd stuff. The vast majority of what we have for supporting the Toolforge setup is defined in Kubernetes, not the image, except for the python and php ones. The others primarily feature a runtime.

The golang one would work, but it makes little sense because it's needlessly large since you don't need compilers...to run golang specifically. We built that one a bit odd.

This is just a matter of building a smaller and faster container for things that don't need much of an image, not a custom one.

I'm still not seeing the practical benefit to users of a toolforge-buster-sssd-web image. I understand that it would be fewer bytes to host this container than toolforge-ruby25-sssd-web or toolforge-golang111-sssd-web, but I really do not see why the tool maintainer would care about those bytes. The only argument that I can think of that makes any sense is one about reducing attack surface for Toolforge from the web due to an RCE vulnerability in the contained webservice.

It reduces pressure on the disk in the cluster to use smaller images (which I really like and would prefer more people used), reduces confusion about the use of the image in toolforge and reduces temptation to run compiles on our poorly equipped cluster for doing that. It's also likely to require next-to-zero disk space in the registry because it'd mostly be the base image. Doesn't even need -web. We're talking -base only likely lighter because we include a lot of dev libs. Costs us little-to-nothing, makes me happy as admin and makes a few users happy. The more jobs we run the more I will want lighter images where we can get away with them.

Change 673378 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/docker-images/toollabs-images@master] static-binaries: first pass at a stripped-down image for binaries

https://gerrit.wikimedia.org/r/673378

As @bd808 suspected, security is indeed the main reason why I’d like to run my dinky webservice in a constrained environment. As an external volunteer developer, I’m always fearing that my contributions may cause more harm than good. Especially when contributing some minor tool that doesn’t see much attention, I can sleep better when there isn’t much else bundled into the container for my webservice. Of course, the risks can be mitigated with container scanning, actively checking CVEs, etc. — but as an external volunteer, I don’t really want to impose such maintenance burden on others. Of course, keeping containers lean isn’t the universal solution to all problems in production security—still, with less baggage, fewer things can go wrong. Basically, it’s an attempt at taming the beast of system complexity.

Beyond security, I also wouldn’t want to waste production resources with my dinky tool, or add to the load of sysadmins, so (again from my perspective as an external volunteer) I’d completely agree with what @Bstorm wrote above.

Change 673378 merged by jenkins-bot:

[operations/docker-images/toollabs-images@master] static-binaries: first pass at a stripped-down image for binaries

https://gerrit.wikimedia.org/r/673378

Ok, @Legoktm and @Sascha the docker-registry.tools.wmflabs.org/toolforge-buster-standalone:latest image is up now. It can be tested in jobs or hand-made deployments for now until I get it in the webservice package. Are you able to test that it meets some use cases before I go do that, like in jobs or whatever?

Sure, glad to try. I’ve changed the qrank-builder job config to use the new image. It seems to work fine.