Page MenuHomePhabricator

[toolforge] simplify calling the different toolforge apis from within the containers
Open, Stalled, MediumPublic

Description

Currently we don't provide anything specific for users to be able to connect to the different toolforge APIs from within jobs and webservices, to do so you have to do a bare http request like:

tools.wm-what@wm-what-59d64fb59c-q6wm9:~$ curl -v \
  -X POST \
  -H "content-type: application/json" \
  --data '{"name": "test", "imagename": "python3.11", "cmd": "env", "mount": "all"}' \
  --insecure \
  --cert $TOOL_DATA_DIR/.toolskube/client.crt \
  --key $TOOL_DATA_DIR/.toolskube/client.key \
  https://api.svc.tools.eqiad1.wikimedia.cloud:30003/jobs/api/v1/jobs/

This would be different to implement in both, the pre-built images and buildservice generated ones.

Pre-build images

  • Install the packages always
    • Would mean that we have to regenerate the images whenever we release a new package
    • Should be easy to add them
  • Let users install them per-environment (not sure it makes sense), this is easy for python (pip install) but other langs might not be doable (we would need to provide many bindings)
    • We don't have a nice way of doing this (having single-binaries would help)

Buildservice

  • Install the packages always and automatically:
    • Using a specific buildpack
      • Might still require us building ubuntu packages
      • We might be able to workaround with venvs + scripts
      • If we had single binary deployments this would be way easier xd
  • Hardcodding it inside the apt-buildpack somehow - probably not as it only triggers with Aptfile
    • Same as the next point
  • Let the user specify the packages in the Aptfile
    • This means enabling toolforge repos (easy)
    • This will require rebuilding the packages for ubuntu jammy + expose the repository (a bit more troublesome)

Related Objects

Event Timeline

dcaro created this task.
dcaro edited projects, added Toolforge; removed Grid-Engine-to-K8s-Migration.
dcaro added a subscriber: dschwen.

See T319953#9385479 for a synopsis of panoviewer's operation. Note that zoomviewer is very similar, and migrating it is tracked by T320210. Both tools have a PHP webservice which queues a job. The job runs a binary and writes the result to NFS. The jobs will both need a buildpack but the webservices don't need anything special except the ability to queue a job. The current plan is for the webservice to use a standard image, and for the job alone to use the buildpack.

Neither tool uses composer so it would increase the tool complexity to use a PHP library for queuing jobs, if such a library existed.

The work I've done on T319953 is already beyond what I would expect from a volunteer, and so I'm very much interested in simplifying the process, so that it continues to be feasible for these kinds of tools to be written and maintained by the community. So I think it should just work, one way or another.

If it's a problem to rebuild the images every time TJF is released, then you could install the packages on the host in a virtual environment, and bind-mount it into the container. In the container, have a wrapper in /usr/local/bin or modify the PATH.

Note that the container would need to get a client certificate somehow, ideally without relying on NFS.

I'd say that the quicker solution right now and more stable is just to make the call yourself from the lang you are using.

The call to create a job would be the equivalent of:

curl -v \
  -X POST \
  -H "content-type: application/json" \
  --data '{"name": "test", "imagename": "python3.11", "cmd": "env", "mount": "all"}' \
  --insecure \
  --cert $TOOLS_DATA_DIR/.toolskube/client.crt \
  --key $TOOLS_DATA_DIR/.toolskube/client.key \
  https://api.svc.tools.eqiad1.wikimedia.cloud:30003/jobs/api/v1/jobs/

It might take some time for us to provide an easier option (will happen, but might not be there before you have to move out of the grid).

I'd say that the quicker solution right now and more stable is just to make the call yourself from the lang you are using.

I could make a shell script that works like toolforge-cli and have the tool run it, that'll make it easier to migrate to the real CLI once it is available.

--cert $TOOLS_DATA_DIR/.toolskube/client.crt \

There is no TOOLS_DATA_DIR environment variable. I tested it yesterday, it's broken. lighttpd is not passing arbitrary environment variables down to PHP.

I'd say that the quicker solution right now and more stable is just to make the call yourself from the lang you are using.

I could make a shell script that works like toolforge-cli and have the tool run it, that'll make it easier to migrate to the real CLI once it is available.

--cert $TOOLS_DATA_DIR/.toolskube/client.crt \

There is no TOOLS_DATA_DIR environment variable. I tested it yesterday, it's broken. lighttpd is not passing arbitrary environment variables down to PHP.

That should have been fixed :/ I did test it, let me recheck: T354320: [webservice] php 7.4 containers don't pass through the environment variables to the scripts

:facepalm: it's a typo, it's $TOOL_DATA_DIR, not $TOOLS_DATA_DIR, let me give you a tested curl example

It's OK, the container just hadn't been restarted since that task was fixed. I see the right environment now.

It's OK, the container just hadn't been restarted since that task was fixed. I see the right environment now.

ack, btw. the above example works when using the right envvar :)

tools.wm-what@wm-what-59d64fb59c-q6wm9:~$ curl -v \
  -X POST \
  -H "content-type: application/json" \
  --data '{"name": "test", "imagename": "python3.11", "cmd": "env", "mount": "all"}' \
  --insecure \
  --cert $TOOL_DATA_DIR/.toolskube/client.crt \
  --key $TOOL_DATA_DIR/.toolskube/client.key \
  https://api.svc.tools.eqiad1.wikimedia.cloud:30003/jobs/api/v1/jobs/
...
{"name": "test", "cmd": "env", "image": "python3.11", "image_state": "stable", "filelog": "False", "filelog_stdout": null, "filelog_stderr": null, "status_short": "Unknown", "status_long": "No pods were created for this job.", "emails": "none", "retry": 0, "mount": "all"}

deleted comment -- sorry wrong task

dcaro triaged this task as Medium priority.Feb 7 2024, 10:10 AM
dcaro renamed this task from [toolforge] allow calling the different toolforge apis from within the containers to [toolforge] simplify calling the different toolforge apis from within the containers.Feb 19 2024, 4:08 PM
dcaro updated the task description. (Show Details)
dcaro changed the task status from Open to Stalled.Tue, Apr 30, 1:30 PM
dcaro moved this task from Next Up to Blocked/Paused on the Toolforge (Toolforge iteration 09) board.