Page MenuHomePhabricator

[jobs-api] current image aliases
Open, MediumPublic

Description

When starting jobs with the toolforge jobs CLI, I think there should be names for the latest language-specific images, e.g. php-latest, node-latest. Also, the latest Debian base image could be base-latest.

The point of this is to allow the tool author to opt in to automatic migration to a new version of a distribution or language, when it becomes available.

Different tools have different backwards compatibility considerations. For complex, actively-maintained tools, the author may want to review the tool to ensure it is compatible with the new version prior to migration. But for simple passively-maintained tools, the author may want to declare that they do not want to do any such review prior to migration. My impression is that most tools are passively maintained.

My motivating use case is a one-line shell script which is likely to work in any current or future image. I don't want to receive an email every two years telling me that the image name needs to be updated. I just want to be told if the tool is broken.

Event Timeline

aborrero triaged this task as Medium priority.Feb 13 2024, 10:07 AM
aborrero added a project: User-aborrero.

We discussed this at length during our toolforge council meeting. We considered two options, neither of which is very popular.

  • Add a default image selection for when --image is not specified

This is concerning as it will trip up most new users. Almost no one wants to use a default empty image, but the failure mode here will be more cryptic than the current failure.

  • Add an alias that looks something like --image latest-debian-python

The value of this depends on how much we believe that an automatic platform upgrade will or won't work. Our general suspicion is that will /sometimes/ work but in many cases will produce a broken tool (a dramatic example of this is the deprecation of python2). It seems better to have a tool maintainer present when that happens, rather than have a tool spontaneously break due to routine SRE maintenance.

Our experience is that simply notifying users that a thing is happening (without requiring user action) almost never works; users are great at ignoring that kind of thing.


The ideal solution for this issue is to move almost everything into gitlab push-to-deploy so that commandline concerns are moot. That doesn't solve the upgrade issue but it does solve the cryptic commandline issue.

Thanks for your thoughts, @tstarling, we're still interested in your thoughts on this topic. Unfortunately, as @bd808 says, long-term tool maintenance is more of a social issue than a technical issue.

The value of this depends on how much we believe that an automatic platform upgrade will or won't work. Our general suspicion is that will /sometimes/ work but in many cases will produce a broken tool (a dramatic example of this is the deprecation of python2). It seems better to have a tool maintainer present when that happens, rather than have a tool spontaneously break due to routine SRE maintenance.

I understand the sentiment, but toolforge jobs is the only place where I've had to type a version or Debian codename in the course of the migration of these two tools (panoviewer and zoomviewer). No image name is needed for web services. No base image name is needed for build service images. For consistency, toolforge jobs should not require a distribution.

The first thing I did when I started work on panoviewer was to remove the Debian codename from the jsub command line, because it was specifying "trusty" and so had been broken for about two years. Having users specify versions does not magically turn them into diligent maintainers. The next version is always going to be better than nothing.

If you have users specify a distro or language version in a command line, you put that tool on a path towards scheduled breakage, and there's nothing toolforge admins can do to make it keep working, short of logging in and editing the source.

I don't know why dschwen specified the distro in that jsub command. I think tool developers either do not think about long-term maintenance, or they overestimate their commitment.

Let's review a random selection of the crontabs still active on sgecron-2 to see what their dependencies are.

  • hashtagwatcher: uses pywiki.
  • wmf-sitematrix: A short node.js script depending only on 'fs'.
  • global-search-pro: A 3-line python script which periodically runs the shell command webservice restart.
  • gnubotmarcoo: uses pywikibot.
  • archive-things-4: shell, wget, curl. Incidentally, this is not Wikimedia-related.
  • pagecounts: PHP
  • bothasava: shell only
  • fvcbot: Python venv (dependencies on NFS)
  • kanzatcopyvio: java
  • arbclerkbot: python, mwclient, mwparserfromhell

OK, so 7/10 need a language runtime of some kind, and 8/10 would fail on the base image which doesn't even have wget.

I'll change my request to just $language-latest, without the default.

tstarling renamed this task from toolforge jobs current image alias to toolforge jobs current image aliases.Feb 13 2024, 10:49 PM
tstarling updated the task description. (Show Details)

Having users specify versions does not magically turn them into diligent maintainers.

Agree, though the problem here is not a tool stopping to work (as most tools will stop to work anyhow when we force-move them to newer images), the root problem is the maintainership, and if every few years the maintainer is not willing to check the tool and move it to a newer image, I would declare it unmaintained.

OK, so 7/10 need a language runtime of some kind, and 8/10 would fail on the base image which doesn't even have wget.

that means that 7/10 will not work and/or fail in subtle ways when the image is automatically moved from one version of the runtime to the other (and back to the point of this being a social issue).

For the 3/10 that don't need a language runtime, they can still use a bulid service image with an Aptfile for the packages they need (wget/curl/jq/...) and use that instead of a pre-built image (currently, they will need also an empty 'requires.txt', or 'composer.json' or similar, but that can be arranged), that way there's more chances that a rebuild on a future distro will still work, and allows future maintainers (and operators) to know explicitly the dependencies for those tools. Yes, it might be that some of those packages are not available in future distros, but at least then we would know with a high certainty, beforehand, that that tool will break.

For the other tools, something similar, if they use the build service, they would have to explicitly specify their language dependencies beforehand, and that would allow us to know, before upgrading, that those tools will break (or if they will not).

And at the same time, the tool user does not have to change the image name every time as you request :)

Would that be acceptable for you?

Would that be acceptable for you?

Well, it's not my product, and ultimately I don't get to choose the architectural direction which was mostly set years ago.

I would not try to push new users towards the build service at present, due to the greater complexity, the slow test/debug cycle, and the bugs that I encountered. You know I spent more time working through build service issues than dschwen spent writing the tools in the first place.

The old system was feasible as a development and test platform, whereas the new system is very tedious in that role and really needs to be complimented by a VPS or local install. A few small changes would improve the situation greatly, like making NFS be the default current directory for jobs and the default document root for web services, but I know that's not where the team is headed.

Obviously NFS has its problems, it's a pain to support, but it is a simple mental model for new users.

In other words, the build service has a number of problems, the most intractable of which is that it gets users on a path towards NFS deprecation by bundling the tool source into the image. So if users have the option of running jobs without using the build service, that's what I'd tell them to do.

dcaro renamed this task from toolforge jobs current image aliases to [toolforge,jobs] current image aliases.Feb 21 2024, 4:22 PM
dcaro edited projects, added Toolforge; removed Toolforge Jobs framework.
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.
dcaro renamed this task from [toolforge,jobs] current image aliases to [jobs-api] current image aliases.Mar 5 2024, 4:10 PM