Remove Python/webservice-runner from toolforge web containers
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	taavi
	Oct 16 2021, 12:43 PM

Description

Toolforge web containers have webservice-runner installed, which pulls Python 3 as a dependency. This adds unnecessary weight to the containers and occasionally causes some problems like T287421. It would be nice to get the Python runtime removed from non-Python containers before the next containers (likely either Debian 12/bookworm release or if production wikis move to php 8.0 or newer and we get a nice repo to make php 8.0+ images from) are created.

Details

Other Assignee: Legoktm

Subject	Repo	Branch	Lines +/-
Convert remaining images to shell webservice-runner	operations/docker-images/toollabs-images	master	+5 -15
shared: lighttpd: fix override file path	operations/docker-images/toollabs-images	master	+2 -2
Use shell webservice-runner for jdk17, ruby27 images	operations/docker-images/toollabs-images	master	+90 -2
Use shell webservice-runner for remaining nodejs images	operations/docker-images/toollabs-images	master	+39 -3
Use shell webservice-runner for golang111 image	operations/docker-images/toollabs-images	master	+45 -1
Use shell webservice-runner for node16 image	operations/docker-images/toollabs-images	master	+13 -1
Use shell webservice-runner for python35/python37 images	operations/docker-images/toollabs-images	master	+84 -2
python39: Use shell reimplementation of webservice-runner	operations/docker-images/toollabs-images	master	+42 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	taavi	T335352 Provide a PHP 8.2 image for Kubernetes toolforge-jobs and webservice
Resolved	taavi	T335507 Build Bookworm based Toolforge Kubernetes images
Resolved	taavi	T358320 [toolforge-webservice] Remove old webservice-runner code
Resolved	taavi	T293552 Remove Python/webservice-runner from toolforge web containers

Event Timeline

taavi created this task.Oct 16 2021, 12:43 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 16 2021, 12:43 PM

taavi mentioned this in T194953: Support hosting Rust tools on Toolforge.Nov 9 2021, 7:54 AM

There are essentially two ways to get this done:

Re-implement webservice-runner in a language like Go or Rust that does not require pulling in a runtime as a dependency
Update webservice to specify the command and any flags by itself, and fully remove the dependency on a runner command inside the container

If possible, getting rid of the runner entirely seems like the best long-term option.

Legoktm awarded a token.Nov 9 2021, 4:38 PM

What burden does the "unnecessary weight" of python3 in each container create? Is this about bytes stored somewhere, conceptual complexity, runtime performance, or something else entirely?

Inside a Kubernetes container, webservice-runner becomes the "pid 0" process of the container and then launches whatever particular runtime is needed as a subprocess. What actually that entails varies a bit from runtime to runtime, but usually is a matter of setting up some configuration for a service by compositing built-in and mounted config files (lighttpd config, uwsgi config, etc) and then starting the service itself.

Instead of one python based entrypoint script that knows how to start everything we could put a runtime language specific entrypoint into each container, but honestly I'm not sure if that will make a whole lot of difference in any measure other than bytes on disk. I think the long term solution is moving to custom containers built via build packs or something similar where the configuration and entrypoint end up baked into the container image based on the stack for that container. I would much rather see energy put into a new future that gives us an exit from grid engine than into shaving a few megabytes off of the container sizes (assuming that's the weight to be removed).

In T293552#7493504, @bd808 wrote:

What burden does the "unnecessary weight" of python3 in each container create? Is this about bytes stored somewhere, conceptual complexity, runtime performance, or something else entirely?

Mostly I am concerned with accidental dependencies being included in the container that end up being part of the containers' "stable interface" that users expect, whether it's Python 3 itself or the system libraries it pulls in (IIRC we had some issues during the switch from Python 2 -> 3). The size reduction is just a side benefit for me.

In the proposed case of a standalone image (see T194953#7491536, no separate ticket yet) it would somewhat defeat the point if said standalone image had many system libraries plus Python 3 installed. It wouldn't defeat the point entirely, just there would be no benefit to using a standalone image vs a plain Python 3 image.

Instead of one python based entrypoint script that knows how to start everything we could put a runtime language specific entrypoint into each container, but honestly I'm not sure if that will make a whole lot of difference in any measure other than bytes on disk. I think the long term solution is moving to custom containers built via build packs or something similar where the configuration and entrypoint end up baked into the container image based on the stack for that container. I would much rather see energy put into a new future that gives us an exit from grid engine than into shaving a few megabytes off of the container sizes (assuming that's the weight to be removed).

Agreed on the long-term direction. I do think there is some value in splitting out the runner component or some refactoring to make it better isolated, because that is what will need to end up in the various buildpacks AIUI (ex: https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/toolforge/buildpacks/+/refs/heads/master/uwsgi/). Mostly it depends on how much work this ends up being, which I don't have a good idea of since it's been a year since I last closely looked into this. I suspect that some of these could easily be taken care of by a shell script, but again, need to look a bit more closely first.

By wstype current implementation:

generic - runs whatever extra_args describes
js - {/usr/local/bin/npm, /usr/bin/npm} start with cwd set to $HOME/www/js
lighttpd-plain - builds /var/run/lighttpd/$TOOL config file possibly with $HOME/.lighttpd.conf concatenated; runs /usr/sbin/lighttpd -f /var/run/lighttpd/$TOOL -D
lighttpd - same as lighttpd-plain but with php fastcgi config added before $HOME/.lighttpd.conf
python - runs a long /usr/bin/uwsgi ... command with optional --venv and --ini args based on files seen on disk
tomcat - /usr/bin/deprecated-tomcat-starter; only used on the grid engine?
uwgsi - /usr/bin/uwsgi --http-socket :$PORT --logto $HOME/uwsgi.log --ini $HOME/uwsgi.ini ...; only used on grid engine?

So ... yeah, I think this could be replaced with a bit of bash scripting really. The lighttpd and python variants do the most setup work, but really not much.

Change 738503 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/docker-images/toollabs-images@master] python39: Use shell reimplementation of webservice-runner

https://gerrit.wikimedia.org/r/738503

gerritbot added a project: Patch-For-Review.Nov 13 2021, 8:33 AM

I did Python 3.9 as a proof-of-concept, the same code should be reusable for the other Python versions.

generic seems simple but I'm not sure if we need to do anything special with quoting args or whatever. That's mostly why I didn't submit a WIP patch adding a standalone image using a generic shell webservice-runner.

Change 738503 merged by jenkins-bot:

[operations/docker-images/toollabs-images@master] python39: Use shell reimplementation of webservice-runner

https://gerrit.wikimedia.org/r/738503

taavi claimed this task.Aug 25 2022, 10:39 AM

Mentioned in SAL (#wikimedia-cloud) [2022-08-25T10:40:11Z] <taavi> tagged new version of the python39-web container with a shell implementation of webservice-runner T293552

Maintenance_bot removed a project: Patch-For-Review.Aug 25 2022, 11:30 AM

Legoktm mentioned this in rODIT9db8bc6b989b: python39: Use shell reimplementation of webservice-runner.Aug 25 2022, 4:15 PM

Change 827007 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/docker-images/toollabs-images@master] Use shell webservice-runner for python35/python37 images

https://gerrit.wikimedia.org/r/827007

gerritbot added a project: Patch-For-Review.Aug 28 2022, 9:52 PM

Change 827009 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/docker-images/toollabs-images@master] Use shell webservice-runner for node16 image

https://gerrit.wikimedia.org/r/827009

Thanks @taavi for deploying it, seems it worked fine! I put up a patch for the rest of the Python images and an initial patch for the newest node version if you want to review those :).

For the generic version, given an extra_args of [www/rust/run.sh], the runner is invoked as [/usr/bin/webservice-runner, --type, generic, --port, 8000, www/rust/run.sh]

We already hardcode/assume port 8000, so we can safely ignore the first 4 arguments. How defensive do we want to be in this processing? Should properly implement arg parsing of --type and --port in bash? Or can we assume they'll always be in that order and just skip the first 4 arguments? My current thinking is that if we do proper arg parsing for those two options it makes it easy to drop them in the future without needing to change the bash runner at the same time.

Change 827007 merged by jenkins-bot:

[operations/docker-images/toollabs-images@master] Use shell webservice-runner for python35/python37 images

https://gerrit.wikimedia.org/r/827007

Legoktm mentioned this in rODIT2231fc258df4: Use shell webservice-runner for python35/python37 images.Aug 29 2022, 4:28 PM

Change 829107 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/docker-images/toollabs-images@master] Use shell webservice-runner for golang111 image

https://gerrit.wikimedia.org/r/829107

In T293552#8192342, @Legoktm wrote:

For the generic version, given an extra_args of [www/rust/run.sh], the runner is invoked as [/usr/bin/webservice-runner, --type, generic, --port, 8000, www/rust/run.sh]

We already hardcode/assume port 8000, so we can safely ignore the first 4 arguments. How defensive do we want to be in this processing? Should properly implement arg parsing of --type and --port in bash? Or can we assume they'll always be in that order and just skip the first 4 arguments? My current thinking is that if we do proper arg parsing for those two options it makes it easy to drop them in the future without needing to change the bash runner at the same time.

I implemented "proper" arg parsing of --type and --port, it wasn't too bad actually.

• bd808 moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.Sep 27 2022, 9:31 PM

Change 827009 merged by jenkins-bot:

[operations/docker-images/toollabs-images@master] Use shell webservice-runner for node16 image

https://gerrit.wikimedia.org/r/827009

Legoktm mentioned this in rODITacb499b5df2e: Use shell webservice-runner for node16 image.Oct 8 2022, 6:23 PM

Mentioned in SAL (#wikimedia-cloud) [2022-10-12T23:25:34Z] <bd808> Rebuilding all Toolforge docker images (T278436, T311466, T293552)

Stashbot mentioned this in T278436: Toolforge: clarify usefullness of 'deb-tools.wmflabs.org' and refresh it if so.Oct 12 2022, 11:25 PM

Stashbot mentioned this in T311466: Create a kubernetes container with mono and dotnet.

Change 872499 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/docker-images/toollabs-images@master] Use shell webservice-runner for remaining nodejs images

https://gerrit.wikimedia.org/r/872499

Legoktm updated the task description. (Show Details)Dec 28 2022, 12:45 AM

Legoktm updated the task description. (Show Details)Dec 28 2022, 12:56 AM

Change 829107 merged by jenkins-bot:

[operations/docker-images/toollabs-images@master] Use shell webservice-runner for golang111 image

https://gerrit.wikimedia.org/r/829107

Change 872499 merged by jenkins-bot:

[operations/docker-images/toollabs-images@master] Use shell webservice-runner for remaining nodejs images

https://gerrit.wikimedia.org/r/872499

Legoktm mentioned this in rODIT65519d0e50f4: Use shell webservice-runner for golang111 image.Jan 1 2023, 12:42 PM

Legoktm mentioned this in rODITf53b4e1a5d15: Use shell webservice-runner for remaining nodejs images.

fnegri edited projects, added cloud-services-team; removed cloud-services-team (Kanban).Jan 18 2023, 7:25 PM

fnegri moved this task from Kanban to Doing? (legacy column) on the cloud-services-team board.

fnegri moved this task from Doing? (legacy column) to Inbox on the cloud-services-team board.Jan 19 2023, 1:02 PM

Marked T335507: Build Bookworm based Toolforge Kubernetes images as a parent task since I'm aiming to get rid of the Python dependency in the Bookworm based images.

Change 912868 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/docker-images/toollabs-images@master] Use shell webservice-runner for jdk17, ruby27 images

https://gerrit.wikimedia.org/r/912868

Change 912868 merged by jenkins-bot:

[operations/docker-images/toollabs-images@master] Use shell webservice-runner for jdk17, ruby27 images

https://gerrit.wikimedia.org/r/912868

Maintenance_bot removed a project: Patch-For-Review.Apr 27 2023, 8:30 PM

• bd808 updated the task description. (Show Details)Apr 27 2023, 10:33 PM

• bd808 updated the task description. (Show Details)Apr 27 2023, 10:44 PM

• bd808 updated the task description. (Show Details)

• bd808 moved this task from Backlog to In Progress on the Toolforge board.Apr 27 2023, 11:24 PM

• bd808 mentioned this in rODIT1f146ff9f03f: Use shell webservice-runner for jdk17, ruby27 images.Apr 28 2023, 12:11 AM

taavi updated Other Assignee, added: Legoktm.May 7 2023, 12:55 PM

We basically only have the lighttpd based images remaining. For that the start script and config needed are a bit longer than so I'm wondering if we want to take a different approach than copy-pasting the same script to all the images. A separate binary package in the toolforge-webservice package might be one option, and and a multi-stage Docker build would be another. Thoughts?

In T293552#8832014, @taavi wrote:

A separate binary package in the toolforge-webservice package might be one option

This seems like a lot of overhead for small gain to me, but I'm willing to hear why I'm missing something important in that.

a multi-stage Docker build would be another

I'm assuming that by this you mean using a published container image as the means of centralization? Something like:

COPY --from=docker-registry.tools.wmflabs.org/toolforge-webservice-runner:latest /srv/app/webservice-runner /usr/bin/

Thoughts?

The copy from container idea sounds neat, but could also introduce some trickiness of making sure a new build of the source container is completed before trying to build containers that would be copying from it.

I initially wondered if symlinks in the git repo could fix this issue, but then I rediscovered upstream discussion of why that's fragile and unimplemented. We do have the build.py script to manage image building here, so we could in theory add our own "copy files from $DIR before building" functionality to go along with the existing Dockerfile generation step.

• bd808 mentioned this in rODITe299db6e3481: Add php82 images.Jul 24 2023, 8:36 PM

taavi moved this task from In Progress to Ready to be worked on on the Toolforge board.Nov 7 2023, 2:21 PM

Change 983520 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/docker-images/toollabs-images@master] shared: lighttpd: fix override file path

https://gerrit.wikimedia.org/r/983520

gerritbot added a project: Patch-For-Review.Dec 16 2023, 4:18 PM

Change 983520 merged by jenkins-bot:

[operations/docker-images/toollabs-images@master] shared: lighttpd: fix override file path

https://gerrit.wikimedia.org/r/983520

Mentioned in SAL (#wikimedia-cloud) [2023-12-16T20:54:30Z] <bd808> Rebuilding all containers to pick up lighttpd config fix and normal package updates (T293552)

Maintenance_bot removed a project: Patch-For-Review.Dec 16 2023, 9:10 PM

taavi mentioned this in rODIT365b1688d5f9: shared: lighttpd: fix override file path.Dec 16 2023, 9:26 PM

taavi mentioned this in T355231: Create Bookworm-based standalone webservice image.Jan 18 2024, 1:08 PM

dcaro triaged this task as Low priority.Jan 24 2024, 3:51 PM

dcaro moved this task from Ready to be worked on to Workspace for triaging whenever needed on the Toolforge board.Jan 24 2024, 3:53 PM

taavi updated the task description. (Show Details)Feb 9 2024, 2:33 PM

dcaro moved this task from Workspace for triaging whenever needed to Ready to be worked on on the Toolforge board.Feb 21 2024, 4:03 PM

Change 1005952 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/docker-images/toollabs-images@master] Convert remaining images to shell webservice-runner

https://gerrit.wikimedia.org/r/1005952

gerritbot added a project: Patch-For-Review.Feb 23 2024, 10:16 AM

taavi added a parent task: T358320: [toolforge-webservice] Remove old webservice-runner code.Feb 23 2024, 10:18 AM