Page MenuHomePhabricator

Missing Perl packages on dev.toolforge.org for anomiebot workflows
Closed, ResolvedPublicBUG REPORT

Description

Per your email, I find that libjson-perl is missing from the bastion. One of my scripts for submitting jobs for my tool uses this to parse JSON to know what needs submitting.

Another one of my scripts needs libdbi-perl:amd64 for database access.

Another needs libwww-perl to hit an HTTP-based API.

I suspect there will turn out to be more missing, but it's hard to tell since it fatals on the first missing package. Here's the full list of perl packages my code uses that seem to now be missing:

  • libbytes-random-secure-perl
  • libcrypt-gcrypt-perl
  • libdbi-perl:amd64
  • libdigest-crc-perl
  • libhtml-parser-perl
  • libhttp-message-perl
  • libjson-perl
  • libnet-oauth-perl
  • libpod-simple-wiki-perl
  • libredis-perl
  • liburi-perl
  • libwww-perl
  • libxml-libxml-perl

I don't know that I'd need _all_ of those on the bastion, but probably most of them.

Related Objects

Event Timeline

@Anomie, can you run the scripts that need these Perl libraries from inside of a webservice perl5.32 shell container session? Or do they also need kubectl, toolforge-jobs, or similar software that is currently only available on the bastions?

bd808 renamed this task from Missing packages on dev.toolforge.org to Missing Perl packages on dev.toolforge.org for anomiebot workflows.Mar 20 2024, 12:13 AM

Something that is probably under advertised related to my webservice perl5.32 shell question is that webservice passes extra cli args into the container's shell which allows things like:

$ webservice perl5.32 shell -- perl -v

This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-gnu-thread-multi
(with 48 registered patches, see perl -V for more detail)

Copyright 1987-2021, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

This might be handy if there are things like shell scripts that need to run on the bastion which need some data from a script that needs to run in a container, assuming that data can be passed as captured command output from a subshell.

webservice perl5.32 shell

Just noting what we talked about, that "toolforge-shell" would be a lot more discoverable as a command name for that. 🙂

Or do they also need kubectl, toolforge-jobs, or similar software that is currently only available on the bastions?

I think I could work around the places that use kubectl in the same way I already do in some other places to hit the API directly when in a container. See also T321919 on that topic.

One script does need toolforge-jobs. I don't know what APIs would be needed to work around that one, or if they'd need further access reconfiguration.

Also, Perl has this neat thing where in my local dev environment I can configure the DBI as dbi:Gofer:transport=stream;url=ssh:login.toolforge.org;dsn=DBI:mysql:mysql_read_default_file=/home/anomie/replica.my.cnf;host=$WIKI.$SVC.db.svc.wikimedia.cloud;database=$WIKI_p and it'll ssh into the bastion to connect to the replica DB from there, instead of having to develop directly in toolforge or constantly sync code to be able to test it. Possibly I could set up my ~/.ssh/config with an entry that would use ProxyCommand to ssh→become→webservice shell→sshd -i though...

Possibly I could set up my ~/.ssh/config with an entry that would use ProxyCommand to ssh→become→webservice shell→sshd -i though...

First attempt using ProxyCommand seemed like it almost worked, except that needs to end with connecting to an sshd somewhere (either one already listening on a port or run as sshd -i) and there isn't an sshd on the exec node for it to connect to.

bd808 triaged this task as High priority.Apr 1 2024, 8:46 PM
bd808 added a project: cloud-services-team.
bd808 moved this task from Inbox to Needs discussion on the cloud-services-team board.

Dropping this into the "needs discussion" column for cloud-services-team as a blocker to decommissioning the remaining Buster bastions.

Dropping this into the "needs discussion" column for cloud-services-team as a blocker to decommissioning the remaining Buster bastions.

I see that, despite this, the main bastion at login.toolforge.org has been "updated" today and my scripts are now broken. 🙁

Dropping this into the "needs discussion" column for cloud-services-team as a blocker to decommissioning the remaining Buster bastions.

I see that, despite this, the main bastion at login.toolforge.org has been "updated" today and my scripts are now broken. 🙁

The announce email for switching login.toolforge.org to the new bastion style included a note that login-buster.toolforge.org can still be used by folks who need a "fat" bastion. The messaging could probably have been more proactive with an announce of the intention to change a day or so before actually changing to give more folks a chance to see the announcement, but time machines are in short supply at the moment.

I can't say that having to change various references to login.toolforge.org in my stuff to login-buster.toolforge.org (which seems like it will only work temporarily) seems like a very good solution.

It's pretty important that Anomie's stuff works. It would be great if it was considered a priority to help resolve this issue.

I have a mostly working solution for this issue in a custom container image created with the Toolforge build service, but it currently needs T356016: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) to be fixed to make the container easier to use.

I have a mostly working solution for this issue in a custom container image created with the Toolforge build service, but it currently needs T356016: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) to be fixed to make the container easier to use.

The shell script workaround does not help there?

I have a mostly working solution for this issue in a custom container image created with the Toolforge build service, but it currently needs T356016: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) to be fixed to make the container easier to use.

The shell script workaround does not help there?

It might actually. I somehow missed that adding the shell script indirection then allows passing $@ through to the wrapped commands again. I will try making a generic wrapper and see if I can get that to do the needful.

@bd808 were you able to try it? I had a quick look at the container image repo, but it seems it's more than just one thing (some kind of templating system?) so I was not sure how to hook things in it.

@bd808 were you able to try it?

I have not yet made enough time to try adding wrapper scripts for the Procfile entrypoints. I did get far enough in reading to understand that what T356016#9542120 describes will require a separate launcher script for each entrypoint ("If you pass any other parameters, it will not wrap the command in a shell, but instead, try to execute it as one binary").

I had a quick look at the container image repo, but it seems it's more than just one thing (some kind of templating system?) so I was not sure how to hook things in it.

The src/... content is the Python launcher skeleton I have dreamed up for use with some other projects like https://gitlab.wikimedia.org/toolforge-repos/containers-bnc where I found a need to hack in a runtime setup step (usually generating a config file) to make things work as hoped. In this particular repo that code is not actually being used, but it was a convenient scaffold to hang the Python package installs off of.

My design goal is for this project is for the user to be able to interact with the container in such a way that it feels mostly like using the various toolforge ... and webservice commands on the bastion, but with the added benefit of having access to a Perl runtime with various libraries installed. The part I have been struggling with is actually getting things like toolforge jobs run to work within the container without requiring the user to use launcher in the command or to enter a special subshell with launcher bash. I am currently not sure if this goal of container transparency is actually possible with our available stack.

https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion/-/merge_requests/1 is my work-in-progress feature branch. The branch uses Aptfile to install a number of utilities and Perl libraries. pyproject.toml is used to install Toolforge cli tools from their gitlab origin repos. Procfile tries to put various commands into the default $PATH, but fails to actually do this in a way that is functional for the Python scripts. This is where I thought that T356016: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) was needed, but at this point I am not sure if that would actually help either. I probably need to make myself another feature branch that avoids T369563: fagiani/apt buildpack very slow when processing a large collection of packages so I can do some quicker iteration on potential solutions and thus better describe the problems I have encountered.

My understanding is that David's webservice changes above allow using a custom-built image with the shell subcommand, which in turn unblocks whatever changes are necessary to the anomiebot tool to run properly without the "full" shell environment present on the old bastion. Is that correct? Are there any other changes necessary from the infrastructure side for this to happen?

Asking because the old Buster bastion is very rapidly becoming the very last remaining Buster machine anywhere in the Wikimedia environments, and at that point I do not want to be the one blocking various cleanups at least without a very clear roadmap of what is missing and a timeline for making that happen.

My understanding is that David's webservice changes above allow using a custom-built image with the shell subcommand, which in turn unblocks whatever changes are necessary to the anomiebot tool to run properly without the "full" shell environment present on the old bastion. Is that correct? Are there any other changes necessary from the infrastructure side for this to happen?

As I understand it, webservice buildservice shell now enters via launcher which should make scripting things through that entrypoint easier. It has been so long since I worked on the larger problem that I have forgotten if this was the main blocker remaining or not.

The repo I was working with probably needs to be updated so it can build with --use-latest-versions so we can make sure things work with the different apt builder that brings.

I do not want to be the one blocking various cleanups at least without a very clear roadmap of what is missing and a timeline for making that happen.

I have empathy for this point of view, but I am not in a position to make this my top priority work stream.

bd808@mbp03:~$ ssh dev.toolforge.org
bd808@tools-bastion-12:~$ become bd808-buildpack
tools.bd808-buildpack@tools-bastion-12:~$ toolforge build start --image-name perl-bastion --ref main --use-latest-versions https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion
...
[step-results] 2025-08-19T00:30:44.404586544Z Built image tools-harbor.wmcloud.org/tool-bd808-buildpack/perl-bastion:latest@sha256:19768c540d9228e6ef75265d85aba09f81f2127fa607270158aff53d5b8a4ce9
tools.bd808-buildpack@tools-bastion-12:~$ exit
bd808@tools-bastion-12:~$ sudo become anomiebot
tools.anomiebot@tools-bastion-12:~$ webservice --buildservice-image tool-bd808-buildpack/perl-bastion:latest --mount all buildservice shell -- kubectl version
Client Version: v1.29.15
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.15
tools.anomiebot@tools-bastion-12:~$ webservice --buildservice-image tool-bd808-buildpack/perl-bastion:latest --mount all buildservice shell -- kubectl auth whoami
ATTRIBUTE   VALUE
Username    anomiebot
Groups      [toolforge system:authenticated]
tools.anomiebot@tools-bastion-12:~/bot$ webservice --buildservice-image tool-bd808-buildpack/perl-bastion:latest --mount all buildservice shell -- /usr/bin/env perl -I /layers/heroku_deb-packages/packages/usr/share/perl5 -I /layers/heroku_deb-packages/packages/usr/lib/x86_64-linux-gnu/perl5/5.38 -w '$TOOL_DATA_DIR/bot/tools-startbot.pl' -- '$TOOL_DATA_DIR/bot/tasks/'
Job anomiebot-2 is already running on anomiebot-2-xlpvg
Job anomiebot-3 is already running on anomiebot-3-kzhqn
Job anomiebot-4 is already running on anomiebot-4-jjfm8
Job anomiebot-5 is already running on anomiebot-5-dm5gf
Job anomiebot-7 is already running on anomiebot-7-qgsnr
Job anomiebot-200 is already running on anomiebot-200-4xb4x
Job anomiebot-500 is already running on anomiebot-500-zv6ng
Job anomiebot-999 is already running on anomiebot-999-lgr2g

Working on this via the buildpack system feels like a never ending game of whack-a-mole. Everything is about finding a hack to make the container work with the limited customization options that the builder stack provides. The current mole that needs to be whacked is setting PERL5LIB=/layers/heroku_deb-packages/packages/usr/share/perl5:/layers/heroku_deb-packages/packages/usr/lib/x86_64-linux-gnu/perl5/5.38 in the runtime environment. This would remove the need for the -I include path modifications. That should be possible via toolforge envvars, but it will also be a tiny bit fragile as the installed Perl version is embedded in the path. The Perl version should only change when we update to a newer base image. The custom PERL5LIB value is needed because the apt buildpack installs things outside of the normal locations which have been compiled into the perl binary upstream.

A more sustainable long term fix from the point of view of the Toolforge platform would be @Anomie changing their bot to be a more "native" Toolforge app built into its own container. This may not be a simple conversion. Over the years Brad has built up a very complex and robust bot framework which is not a classically Kubernetes native design. I assume he would also like to retain the ability to run the bot outside of Toolforge when needed.

bd808 removed bd808 as the assignee of this task.Sep 9 2025, 11:42 PM

I need to unlick this cookie for now. If anyone wants to take over https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion I would be happy to help you do so. I think things are in an at least semi-working state for @Anomie with that container.

I think we should really shut down the Buster bastion host, as Buster has been EOL for more than 1 year now.

If the current state of the container-based bastion is not semi-working enough, what about creating a new Trixie-based Bastion host dedicated to @Anomie where all necessary packages can be installed?

Repeating my question from the last time: are there any remaining infrastructure blockers remaining to make this happen?

If not, I don't think it is reasonable for us to spend the effort to re-create a "full" bastion on a newer OS release.

At this point I have things mostly working using webservice perl5.40 shell to run things, after replacing some code that shelled out to kubectl or toolforge-jobs to use APIs instead.

The following remaining issues aren't really blockers, just annoying:

  • T321919#11126898 - Working around that by continuing to use the Kubernetes API rather than the Toolforge API to fetch the list of running jobs.
  • T403295 - I don't get status or error output from that workflow thanks to the bug, but I can still manually check if things started/restarted correctly.
  • T403286 - I put a sleep 1 at the start of a bunch of things to delay the output, which is usually enough to avoid the bug at the cost of things being 1s slower to do.

Thanks. In that case I'm moving forward with retiring the anchient grid bastion VM.